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FEATURE-BASED VIDEO COMPRESSION METHOD 

FIELD OF THE INVENTION 

The present invention relates to processes for compressing digital video 
5 signals and, in particular, to an object-based digital video encoding process with 
error feedback to increase accuracy. 

BACKGROUND OF THE INVENTION 

Full-motion video displays based upon analog video signals have long been 

10 available in the form of television. With recent increases in computer processing 
capabilities and affordability, full-motion video displays based upon digital video 
signals are becoming more widely available. Digital video systems can provide 
significant improvements over conventional analog video systems in creating, 
modifying, transmitting, storing, and playing ftill-motion video sequences. 

15 Digital video displays include large numbers of image frames that are played 

or rendered successively at frequencies of between 30 and 75 Hz. Each image frame 
is a still image formed from an array of pixels according to the display resolution of 
a particular system. As examples, VHS-based systems have display resolutions of 
320x480 pixels, NTSC-based systems have display resolutions of 720x486 pixels, 

20 and high-definition television (HDTV) systems under development have display 
resolutions of 1360x1024 pixels. 

The amounts of raw digital information included in video sequences are 
massive. Storage and transmission of these amounts of video information is 
infeasible with conventional personal computer equipment. With reference to a 

25 digitized form of a relatively low resolution VHS image format having a 320x480 
pixel resolution, a full-length motion picture of two hours in duration could 
correspond to 100 gigabytes of digital video information. By comparison, 
conventional compact optical disks have capacities of about 0.6 gigabytes, magnetic 
hard disks have capacities of 1-2 gigabytes, and compact optical disks imder 

30 development have capacities of up to 8 gigabytes. 

In response to the limitations in storing or transmitting such massive amounts 
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of digital video information, various video compression standards or processes have 
been established, including MPEG-1, MPEG-2, and H.26X. These conventional 
video compression techniques utilize similarities between successive image frames, 
referred to as temporal or interframe correlation, to provide interframe compression 
in which pixel-based representations of image frames are converted to motion 
representations. In addition, the conventional video compression techniques utilize 
similarities within image frames, referred to as spatial or intraframe correlation, to 
provide intraframe compression in which the motion representations within an image 
frame are further compressed. Intraframe compression is based upon conventional 
processes for compressing still images, such as discrete cosine transform (DCT) 
encoding. 

Although differing in specific implementations, the MPEG-1. MPEG-2, and 
H.26X video compression standards are similar in a number of respects. The 
following description of the MPEG-2 video compression standard is generally 
15 applicable to the others. 

MPEG-2 provides interframe compression and intraframe compression based 
upon square blocks or arrays of pixels in video images. A video image is divided 
into transformation blocks having dimensions of 16x16 pixels. For each 
transformation block T^ in an image frame N, a search is performed across the 

20 image of an immediately preceding image frame N-1 or also a next successive video 
frame N+1 (i.e., bidirectionally) to identify the most similar respective 
transformation blocks T^., or T^.,. 

Ideally, and with reference to a search of the next successive image frame, 
the pixels in transformation blocks T^ and Tn., are identical, even if the 

25 transformation blocks have different positions in their respective image frames. 
Under these circumstances, the pixel information in transformation block T^., is 
redundant to that in transformation block T^. Compression is achieved by 
substituting the positional translation between transformation blocks 1^ and T^., for 
the pixel information in transformation block T^.,- In this simplified example, a 

30 single translational vector (AX.AY) is designated for the video information 
associated with the 256 pixels in transformation block Tn-,. 
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Frequently, the video information (i.e., pixels) in the corresponding 
transformation blocks and T^^i are not identical. The difference between them is 
designated a transformation block error E, which often is significant. Although it is 
compressed by a conventional compression process such as discrete cosine transform 
5 (DCT) encoding, the transformation block error E is cumbersome and limits the 
extent (ratio) and the accuracy by which video signals can be compressed. 

Large transformation block errors E arise in block-based video compression 
methods for several reasons. The block-based motion estimation represents only 
translational motion between successive image frames. The only change between 

10 corresponding transformation blocks T^^ and T^^i that can be represented are changes 
in the relative positions of the transformation blocks. A disadvantage of such 
representations is that full-motion video sequences frequently include complex 
motions other than translation^ such as rotation, magnification and shear. 
Representing such complex motions with simple translational approximations results 

15 in the significant errors. 

Another aspect of video displays is that they typically include multiple image 
features or objects that change or move relative to each other. Objects may be 
distinct characters, articles, or scenery within a video display. With respect to a 
scene in a motion picture, for example, each of the characters (i.e., actors) and 

20 articles (i.e., props) in the scene could be a different object. 

The relative motion between objects in a video sequence is smother source of 
significant transformation block errors E in conventional video compression 
processes. Due to the regular configuration and size of the transformation blocks, 
many of them encompeiss portions of different objects. Relative motion between the 

25 objects during successive image frames can result in extremely low correlation (i.e., 
high transformation errors E) between corresponding transformation blocks. 
Similarly, the appearance of portions of objects in successive image frames (e.g., 
when a character turns) also introduces high transformation errors E. 

Conventional video compression methods appear to be inherently limited due 

30 to the size of transformation errors E. With the increased demand for digital video 
display capabilities, improved digital video compression processes are required. 
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SUMMARY OF THE rNVENTION 

The present invention includes a video compression encoder process for 
compressing digitized video signals representing display motion in video sequences 
of multiple image frames. The encoder process utilizes object-based video 
5 compression to improve the accuracy and versatility of encoding interframe motion 
and intraframe image features. Video information is compressed relative to objects 
of arbitrary configurations, rather than fixed, regular arrays of pixels as in 
conventional video compression methods. This reduces the error components and 
thereby improves the compression efficiency and accuracy. As another benefit, 

10 object-based video compression of this invention provides interactive video editing 
capabilities for processing compressed video information. 

In a preferred embodiment, the process or method of this invention includes 
identifying image features of arbitrary configuration in a first video image frame and 
defining v^thin the image feature multiple distinct feature points. The feature points 

15 of the image feature in the first video image frame are correlated with corresponding 
feature points of the image feature in a succeeding second video image frame, 
thereby to determine an estimation of the image feature in the second video image 
frame. A difference between the estimated and actual image feature in the second 
video image frame is determined and encoded in a compressed format. 

20 The encoder process of this invention overcomes the shortcomings of the 

conventional block-based video compression methods. The encoder process 
preferably uses a multi-dimensional transformation method to represent mappings 
between corresponding objects in successive image frames. The multiple dimensions 
of the transformation refer to the number of coordinates in its generalized form. 

^5 The multi-dimensional transformation is capable of representing complex motion that 
includes any or all of translation, rotation, magnification, and shear. As a result, 
complex motion of objects between successive image frames may be represented 
with relatively low transformation error. 

Another source of error in conventional block-based video compression 

\0 methods is motion between objects included within a transformation block. The 

object-based video compression or encoding of this invention substantially eliminates 
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the relative motion between objects within transformation blocks. As a result, 
transfomiation error arising from inter-object motion also is substantially decreased. 
The low transformation errors arising from the encoder process of this invention 
allow it to provide compression ratios up to 300% greater than those obtainable from 
5 prior encoder processes such as MPEG-2. 

The foregoing and other features and advantages of the preferred embodiment 
of the present invention will be more readily apparent from the following detailed 
description, which proceeds with reference to the accompanying drawings. 

10 BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a block diagram of a computer system that may be used to 
implement a method and apparatus embodying the invention. 

Figs. 2A and 2B are simplified representations of a display screen of a video 
display device showing two successive image frames corresponding to a video 
15 signal. 

Fig. 3A is a generalized functional block diagram of a video compression 
encoder process for compressing digitized video signals representing display motion 
in video sequences of multiple image frames. Fig. 3B is a functional block diagram 
of a master object encoder process according to this invention. 
20 Fig- 4 is a functional block diagram of an object segmentation process for 

segmenting selected objects from an image frame of a video sequence. 

Fig. 5A is simplified representation of display screen of the video display 
device of Fig. 2A, and Fig. 5B is an enlarged representation of a portion of the 
display screen of Fig. 5A. 
-5 Fig. 6 is a functional block diagram of a polygon match process for 

determining a motion vector for corresponding pairs of pixels in corresponding 
objects in successive image frames. 

Figs. 7A and 7B are simplified representations of a display screen showing 
two successive image frames with two corresponding objects, 
0 Fig. 8 is a functional block diagram of an alternative pixel block correlation 

process. 
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Fig. 9A is a schematic representation of a first pixel block used for 
identifying corresponding pixels in different image frames. Fig. 9B is a schematic 
representation of an array of pixels corresponding to a search area in a prior image 
frame where corresponding pixels are sought. Figs. 9C-9G are schematic 
5 representations of the first pixel block being scanned across the pixel array of FIG. 
9B to identify corresponding pixels. 

Fig. lOA is a schematic representation of a second pixel block used for 
identifying corresponding pixels in different image frames. Figs. lOB-lOF are 
schematic representations of the second pixel block being scanned across the pixel 
10 array of FIG. 9B to identify corresponding pixels. 

Fig. 11 A is a schematic representation of a third pixel block used for 
identifying corresponding pixels in different image frames. Figs. 1 IB- 11 F are 
schematic representations of the third pixel block being scanned across the pixel 
array of Fig. 9B. 

15 Fig. 12 is a function block diagram of a multi-dimensional transformation 

method that includes generating a mapping between objects in first and second 
successive image frames and quantitizing the mapping for transmission or storage. 

Fig. 13 is a simplified representation of a display screen showing the image 
frame of Fig. 7B for purposes of illustrating the multi-dimensional transformation 

20 method of Fig. 12, 

Fig. 14 is an enlarged simplified representation showing three selected pixels 
of a transformation block used in the quantization of affine transformation 
coefficients determined by the method of Fig. 12. 

Fig, 15 is a functional block diagram of a transformation block optimization 
25 method utilized in an alternative embodiment of the multi-dimensional 
transformation method of Fig. 12, 

Fig, 16 is a simplified fragmentary representation of a display screen showing 
the image frame of Fig. 7B for purposes of illustrating the transformation block 
optimization method of Fig. 15. 
30 Figs. 17A and 17B are a functional block diagram of a precompression 

extrapolation method for extrapolating image features of arbitrary configuration to a 
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predefined configuration to facilitate compression. 

Figs. 18A-18D are representations of a display screen on which a simple 
object is rendered to show various aspects of the extrapolation method of Fig. 14. 

Figs. 19A and 19B are functional block diagrams of an encoder method and 
5 a decoder method, respectively, employing a Laplacian pyramid encoder method in 
accordance with this invention. 

Figs. 20A-20D are simplified representations of the color component values 
of an arbitrary set or array of pixels processed according to the encoder process of 
Fig. 19 A. 

10 Fig. 21 is a functional block diagram of a motion vector encoding process 

according to this invention. 

Fig. 22 is a functional block diagram of an alternative quantized object 
encoder-decoder process. 

Fig. 23A is a generalized functional block diagram of a video compression 
15 decoder process matched to the encoder process of Fig. 3, Fig. 23B is a functional 
diagram of a master object decoder process according to this invention. 

Fig. 24A is a diagrammatic representation of a conventional chain code 
format. Fig. 24B is a simplified representation of an exemplary contour for 
processing with the chain code format of Fig. 24A. 
20 Fig. 25A is a functional block diagram of a chain coding process of this 

invention. 

Fig. 25B is a diagrammatic representation of a chain code format of the 
present invention. 

Fig, 25C is a diagrammatic representation of special case chain code 
25 modifications used in the process of Fig. 25A. 

Fig. 26 is a functional block diagram of a sprite generating or encoding 
process. 

Figs. 27A and 27B are respective first and second objects defined by bitmaps 
and showing grids of triangles superimposed over the objects in accordance with the 
30 process of Fig. 26. 

Fig. 28 is a functional block diagram of a sprite decoding process 
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corresponding to the encoding process of Fig. 26. 

DETAILED DESCRIPTIO N OF PRFFFRRED FMRnnrMFMTg 
Referring to Fig. 1, an operating environment for the preferred embodiment 
of the present invention is a computer system 20. either of a general purpose or a 
dedicated type, that comprises at least one high speed central processing unit (CPU) 
22, in conjunction with a memory system 24, an input device 26, and an output 
device 28. These elements are interconnected by a bus structure 30. 

The illustrated CPU 22 is of familiar design and includes an ALU 32 for 
performing computations, a collection of registers 34 for temporary storage of data 
and insuTictions, and a control unit 36 for controlling operation of the system 20. 
CPU 22 may be a processor having any of a variety of architectures including Alpha 
from Digital, MIPS from MIPS Technology, NEC, IDT, Siemens, and others, x86 
from Intel and others, including Cyrix, AMD. and Nexgen, and the PowerPC from 
15 IBM and Motorola. 

The memory system 24 includes main memory 38 and secondary storage 40. 
Illustrated main memory 38 takes the form of 16 megabytes of semiconductor RAM 
memory. Secondary storage 40 takes the form of long term storage, such as ROM. 
optical or magnetic disks, flash memory, or tape. Those skilled in the art will 
appreciate that memory system 24 may comprise many other alternative components. 

The input and output devices 26, 28 are also familiar. The input device 26 
can comprise a keyboard, a mouse, a physical u-ansducer (e.g.. a microphone), etc. 
The output device 28 can comprise a display, a printer, a transducer (e.g. a speaker), 
etc. Some devices, such as a network interface or a modem, can be used as input 
25 and/or output devices. 

As is familiar to those skilled in the art, the computer system 20 further 
includes an operating system and at least one application program. The operating 
system is the set of software which controls the computer system's operation and the 
allocation of resources. The application program is the set of software that perfonns 
a task desired by the user, making use of computer resources made available through 
the operating system. Both are resident in the illustrated memory system 24. 



20 



30 
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In accordance with the practices of persons skilled in the art of computer 
programming, the present invention is described below with reference to symbolic 
representations of operations that are performed by computer system 20, unless 
indicated otherwise. Such operations are sometimes referred to as being 
5 computer-executed. It will be appreciated that the operations which are symbolically 
represented include the manipulation by CPU 22 of electrical signals representing 
data bits and the maintenance of data bits at memory locations in memory system 
24, as well as other processing of signals. The memory locations where data bits are 
maintained are physical locations that have particular electricaK magnetic, or optical 

10 properties corresponding to the data bits. 

Figs. 2A and 2B are simplified representations of a display screen 50 of a 
video display device 52 (e.g., a television or a computer monitor) showing two 
successive image frames 54a and 54b of a video image sequence represented 
electronically by a corresponding video signal. Video signals may be in any of a 

15 variety of video signal formats including analog television video formats such as 

NTSC, PAL, and SECAM, and pixelated or digitized video signal formats typically 
used in computer displays, such as VGA, CGA, and EGA, Preferably, the video 
signals corresponding to image frames are of a digitized video signal format, either 
as originally generated or by conversion from an analog video signal format, as is 

20 known in the art. 

Image frames 54a and 54b each include a rectangular solid image feature 56 
and a pyramid image feature 58 that are positioned over a background 60. Image 
features 56 and 58 in image frames 54a and 54b have different appearances because 
different parts are obscured and shown. For purposes of the following description, 

25 the particular form of an image feature in an image frame is referred to as an object 
or, alternatively, a mask. Accordingly, rectangular solid image feanire 56 is shown 
as rectangular solid objects 56a and 56b in respective image frames 54a and 54b, 
and pyramid image feature 58 is shown as pyramid objects 58a and 58b in respective 
image frames 54a and 54b. 

30 Pyramid image feature 58 is shown with the same position amd orientation in 

image frames 54a and 54b and would "appear" to be motionless when shown in the 
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video sequence. Rectangular solid 56 is shown in frames 54a and 54b with a 
different orientation and position relative to pyramid 58 and would "appear" to be 
moving and rotating relative to pyramid 58 when shown in the video sequence. 
These appearances of image features 58 and 60 are figurative and exaggerated. The 
image frames of a video sequence typically are displayed at rates in the range of 30- 
80 Hz. Human perception of video motion typically requires more than two image 
frames. Image frames 54a and 54b provide, therefore, a simplified representation of 
a conventional video sequence for purposes of illustrating the present invention. 
Moreover, it will be appreciated that the present invention is in no way limited to 
such simplified video images, image features, or sequences and. to the contrary, is 
applicable to video images and sequences of arbitrary complexity. 

VIDEO COMPRESSION ENCODER PROCESS OVERVIEW 

Fig. 3A is a generalized functional block diagram of a video compression 
encoder process 64 for compressing digitized video signals representing display 
motion in video sequences of multiple image frames. Compression of video 
information (i.e.. video sequences or signals) can provide economical storage and 
transmission of digital video information in applications that include, for example, 
interactive or digital television and multimedia computer applications. For purposes 
of brevity, the reference numerals assigned to function blocks of encoder process 64 
are used interchangeably in reference to the results generated by the function blocks. 

Conventional video compression techniques utilize similarities between 
successive image frames, referred to as temporal or interframe correlation, to provide 
interframe compression in which pixel-based representations of image frames are 
converted to motion representations. In addition, conventional video compression 
techniques utilize similarities within image frames, referred to as spatial or 
intraframe correlation, to provide intraframe compression in which the motion 
representations within an image frame are further compressed. 

In such conventional video compression techniques, including MPEG-1, 
MPEG-2, and H.26X. the temporal and spatial correlations are determined relative to 
simple translations of fixed, regular (e.g., square) arrays of pixels. Video 
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information commonly includes, however, arbitrary- video motion that cannot be 
represented accurately by translating square arrays of pixels. As a consequence, 
conventional video compression techniques typically include significant error 
components that limit the compression rate and accuracy. 
5 In contrast, encoder process 64 utilizes object-based video compression to 

improve the accuracy and versatility of encoding interframe motion and intraframe 
image features. Encoder process 64 compresses video information relative to objects 
of arbitrary configurations, rather than fixed, regular arrays of pixels. This reduces 
the error components and thereby improves the compression efficiency and accuracy. 

10 As another benefit, object-based video compression provides interactive video editing 
capabilities for processing compressed video information. 

Referring to Fig. 3 A, function block 66 indicates that user-defined objects 
within image ft*ames of a video sequence are segmented from other objects within 
the image frames. The objects may be of arbitrary configuration and preferably 

15 represent distinct image features in a display image. Segmentation includes 

identifying the pixels in the image frames corresponding to the objects. The user- 
defined objects are defined in each of the image frames in the video sequence. In 
Figs. 2A and 2B, for example, rectangular solid objects 56a and 56b and pyramid 
objects 58a and 58b are separately segmented. 

20 The segmented objects are represented by binary or multi-bit (e.g., 8-bit) 

"alphachannel" masks of the objects. The object masks indicate the size, 
configuration, and position of an object on a pixel-by-pixel basis. For purposes of 
simplicity, the following description is directed to binary masks in which each pixel 
of the object is represented by a single binary bit rather than the typical 24-bits (i.e., 

25 8 bits for each of three color component values). Multi-bit (e.g., 8-bit) masks also 
have been used. 

Function block 68 indicates that "feature points'* of each object are defined 
by a user. Feature points preferably are distinctive featxires or aspects of the object. 
For example, comers 70a-70c and comers 72a- 72c could be defined by a user as 
30 feature points of rectangular solid 56 and pyramid 58, respectively. The pixels 
corresponding to each object mask and its feature points in each image frame are 
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stored in an object database included in memory system 24. 

Function block 74 indicates that changes in the positions of feature points in 
successive image frames are identified and trajectories determined for the feature 
points between successive image frames. The trajectories represent the direction and 
5 extent of movement of the feature points. Function block 76 indicates that 

trajectories of the feature points in the object between prior frame N-1 and current 
frame N also is retrieved from the object data base. 

Function block 78 indicates that a sparse motion transformation is determined 
for the object between prior frame N-1 and current frame N. The sparse motion 
10 transformation is based upon the feature point trajectories between frames N-1 and 
N. The sparse motion transformation provides an approximation of the change of 
the object between prior frame N- 1 and current frame N. 

Function block 80 indicates that a mask of an object in a current frame N is 
retrieved from the object data base in memory system 24. 
15 Function block 90 indicates that a quantized master object or "sprite" is 

formed from the objects or masks 66 corresponding to an image feature in an image 
frame sequence and feature point trajectories 74. The master object preferably 
includes all of the aspects or features of an object as it is represented in multiple 
frames. With reference to Figs. 2A and 2B. for example, rectangular solid 56 in 
20 frame 54b includes a side 78b not shown in frame 54a. Similarly, rectangular solid 
56 includes a side 78a in frame 54a not shown in frame 54b. The master object for 
rectangular solid 56 includes both sides 78a and 78b. 

Sparse motion transformation 78 frequently will not provide a complete 
representation of the change in the object between frames N-1 and N. For example, 
25 an object in a prior frame N-l, such as rectangular object 54a, might not include all 
the features of the object in the current frame N, such as side 78b of rectangular 
object 54b. 

To improve the accuracy of the transformation, therefore, an intersection of 
the masks of the object in prior frame N-1 and current frame N is determined, such 
as by a logical AND function as is known in the art. The mask of the object in the 
current frame N is subtracted from the resulting intersection to identify any portions 



30 
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or features of the object in the current frame N not included in the object in the 
prior frame N-1 (e.g., side 78b of rectangular object 54b, as described above). The 
newly identified portions of the object are incorporated into master object 90 so that 
it includes a complete representation of the object in frames N-1 and N. 
5 Function block 96 indicates that a quantized form of an object 98 in a prior 

frame N-1 (e.g., rectangular solid object 56a in image frame 54a) is transformed by 
a dense motion transformation to provide a predicted form of the object 1 02 in a 
current frame N (e.g., rectangular solid object 56b in image frame 54b). This 
transformation provides object-based interframe compression. 
10 The dense motion transformation preferably includes determining an affine 

transformation between quantized prior object 98 in frame N-1 and the object in the 
current frame N and applying the affine transformation to quantized prior object 98. 
The preferred affine transformation is represented by affine transformation 
coefficients 104 and is capable of describing translation, rotation, magnification, and 

15 shear. The affine transformation is determined from a dense motion estimation, 
preferably including a pixel-by-pixel mapping, between prior quantized object 98 
and the object in the current frame N. 

Predicted current object 102 is represented by quantized prior object 98, as 
modified by dense motion transformation 96, £ind is capable of representing 

20 relatively complex motion, together with any new image aspects obtained from 

master object 90. Such object-based representations are relatively accurate because 
the perceptual and spatial continuity associated with objects eliminates errors arising 
from the typically changing relationships between different objects in different image 
frames. Moreover, the object-based representations allow a user to represent 

25 different objects with different levels of resolution to optimize the relative efficiency 
and accuracy for representing objects of varying complexity. 

Fimction block 106 indicates that for image frame N, predicted current object 
102 is subtracted from original object 108 for current frame N to determine an 
estimated error 1 10 in predicted object 102. Estimated error 1 10 is a compressed 

30 representation of current object 108 in image frame N relative to quantized prior 
object 98. More specifically, current object 108 may be decoded or reconstructed 
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from estimated error 110 and quantized prior object 98. 

Function block 112 indicates that estimated error 110 is compressed or 
"coded" by a conventional "lossy" still image compression method such as lattice 
subband or other wavelet compression or encoding as described in Multirate Sv^tem. 
and Filter Banks by Vaidyanathan, PTR Prentice-Hall, Inc., Englewood Cliffs, New 
Jersey, (1993) or discrete cosine transform (DCT) encoding as described in JPEG: 
Still Image Data Compression StanH;^rH by Pennebaker at al., Van Nostrand 
Reinhold, New York (1993). 

As is known in the art, "lossy" compression methods introduce some data 
distortion to provide increased data compression. The data distortion refers to 
variations between the original data before compression and the data resulting after 
compression and decompression. For purposes of illustration below, the 
compression or encoding of function block 102 is presumed to be wavelet encoding. 
Function block 1 14 indicates that the wavelet encoded estimated error from 
15 ftmction block 112 is further compressed or "coded" by a conventional "lossless- 
still image compression method to form compressed data 116. A preferred 
conventional "lossless" still image compression method is entropy encoding as 
described in JPEG: Still Image Data ro mpression Rt^nri^rA by Pennebaker et al. As 
is known in the art, "lossless" compression methods introduce no data distortion. 
20 An error feedback loop 1 18 utilizes the wavelet encoded estimated error from 

function block 1 12 for the object in frame N to obtain a prior quantized object for 
succeeding frame N+1. As an initial step in feedback loop 1 18, function block 120 
indicates that the wavelet encoded estimated error from ftmction block 1 12 is inverse 
wavelet coded, or wavelet decoded, to form a quantized error 122 for the object in 
25 image frame N. 

The effect of successively encoding and decoding estimated error 110 by a 
lossy still image compression method is to omit from quantized error 122 video 
information that is generally imperceptible by viewers. This information typically is 
of higher frequencies. As a result, omitting such higher frequency components 
30 typically can provide image compression of up to about 200% with only minimal 
degradation of image quality. 
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Function block 124 indicates that quantized error 122 and predicted object 
102, both for image frame N, are added together to form a quantized object 126 for 
image frame N. After a timing coordination delay 128, quantized object 126 
becomes quantized prior object 98 and is used as the basis for processing the 
5 corresponding object in image frame N+1. 

Encoder process 64 utilizes the temporal correlation of corresponding objects 
in successive image frames to obtain improved interframe compression, and also 
utilizes the spatial correlation within objects to obtain accurate and efficient 
intraframe compression. For the interframe compression, motion estimation and 

10 compensation are performed so that an object defined in one frame can be estimated 
in a successive frame. The motion-based estimation of the object in the successive 
frame requires significantly less information than a conventional block-based 
representation of the object. For the intraframe compression, an estimated error 
signal for each object is compressed to utilize the spatial correlation of the object 

15 within a frame and to allow different objects to be represented at different 

resolutions. Feedback loop 118 allows objects in subsequent frames to be predicted 
from fully decompressed objects, thereby preventing accumulation of estimation 
error. 

Encoder process 64 provides as an output a compressed or encoded 
20 representation of a digitized video signal representing display motion in video 
sequences of multiple image frames. The compressed or encoded representation 
includes object masks 66, feature points 68, affine transform coefficients 104, and 
compressed error data 116. The encoded representation may be stored or 
transmitted, according to the particular application in which the video information is 
25 used. 

Fig. 3B is a functional block diagram of a master object encoder process 130 
for encoding or compressing master object 90. Function block 132 indicates that 
master object 90 is compressed or coded by a conventional "lossy" still image 
compression method such as lattice subband or other wavelet compression or discrete 
30 cosine transform (DCT) encoding. Preferably, function block 132 employs wavelet 
encoding. 
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Function block 134 indicates that the wavelet encoded master object from 
function block 132 is further compressed or coded by a conventional "lossless" still 
image compression method to form compressed master object data 136. A preferred 
conventional lossless still image compression method is entropy encoding. 

Encoder process 130 provides as an output compressed master object 136. 
Together with the compressed or encoded representations provided by encoder 
process 64, compressed master object 136 may be decompressed or decoded after 
storage or transmission to obtain a video sequence of multiple image frames. 

Encoder process 64 is described with reference to encoding video information 
corresponding to a single object within an image frame. As shown in Figs. 2A and 
2B and indicated above, encoder process 64 is performed separately for each of the 
objects (e.g., objects 56 and 58 of Figs. 2A and 2B) in an image frame. Moreover, 
many video images include a background over which arbitrary numbers of image 
features or objects are rendered. Preferably, the background is processed as an 
15 object according to this invention after all user-designated objects are processed. 

Processing of the objects in an image frame requires that the objects be 
separately identified. Preferably, encoder process 64 is applied to the objects of an 
image frame beginning with the forward-most object or objects and proceeding 
successively to the back-most object (e.g., the background). The compositing of the 
encoded objects into a video image preferably proceeds from the rear-most object 
(e.g., the background) and proceeds successively to the forward-most object (e.g.. 
rectangular solid 56 in Figs. 2A and 2B). The layering of encoding objects may be 
communicated as distinct layering data associated with the objects of an image frame 
or, alternatively, by transmitting or obtaining the encoded objects in a sequence 
25 corresponding to the layering or compositing sequence. 

OBJECT SEGMENTATION AND TRACKING 

In a preferred embodiment, the segmentation of objects within image frames 
referred to in function block 66 allows interactive segmentation by users. The object 
30 segmentation of this invention provides improved accuracy in segmenting objects 

and is relatively fast and provides users with optimal flexibility in defining objects to 
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be segmented. 

Fig. 4 is a functional block diagram of an object segmentation process 140 
for segmenting selected objects from an image frame of a video sequence. Object 
segmentation according to process 140 provides a perceptual grouping of objects that 
is accurate and quick and easy for users to define. 

Fig. 5A is simplified representation of display screen 50 of video display 
device 52 showing image frame 54a and the segmentation of rectangular solid object 
56a. In its rendering on display screen 50, rectangular solid object 56a includes an 
object perimeter 142 (shown spaced apart from object 56a for clarity) that bounds an 
object interior 144. Object interior 144 refers to the outline of object 56a on display 
screen 50 and in general may correspond to an inner surface or, as shown, an outer 
surface of the image feature. Fig. 5B is an enlarged representation of a portion of 
display screen 50 showing the semi-automatic segmentation of rectangular solid 
object 56a. The following description is made with specific reference to rectangular 
solid object 56a, but is similarly applicable to each object to be segmented from an 
image frame. 

Function block 146 indicates that a user forms within object interior 144 an 
interior outline 148 of object perimeter 142. The user preferably forms interior 
outline 148 with a conventional pointer or cursor control device, such as a mouse or 
trackball. Interior outline 148 is formed within a nominal distance 150 from object 
perimeter 142. Nominal distance 150 is selected by a user to be sufficiently large 
that the user can form interior outline 148 relatively quickly within nominal distance 
150 of perimeter 142. Nominal distance 150 corresponds, for example, to between 
about 4 and 10 pixels. 

Function block 146 is performed in connection with a key frame of a video 
sequence. With reference to a scene in a conventional motion picture, for example, 
the key frame could be the first frame of the multiple frames in a scene. The 
participation of the user in this function renders object segmentation process 140 
semi-automatic, but significantly increases the accuracy and flexibility with which 
objects are segmented. Other than for the key frame, objects in subsequent image 
frames are segmented automatically as described below in greater detail. 
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Function block 152 indicates that interior outline 148 is expanded 
automatically to form an exterior outline 156. The formation of exterior outline 156 
is performed as a relatively simple image magnification of outline 148 so that 
exterior outline 156 is a user-defined number of pixels from interior outline 148. 
Preferably, the distance between interior outline 148 and exterior outline 156 is 
approximately twice distance 1 50. 

Function block 158 indicates that pixels between interior outline 148 and 
exterior outline 156 are classified according to predefined attributes as to whether 
they are within object interior 144. thereby to identify automatically object perimeter 
142 and a corresponding mask 80 of the type described with reference to Fig. 3 A. 
Preferably, the image attributes include pixel color and position, but either attribute 
could be used alone or with other attributes. 

In the preferred embodiment, each of the pixels in interior outline 148 and 
exterior outline 156 defines a "cluster center" represented as a five-dimensional 
15 vector in the form of (r, g, b, x, y). The terms r, g. and b correspond to the 

respective red, green, and blue color components associated with each of the pixels, 
and the terms x and y correspond to the pixel locations. The m-number of cluster 
center vectors corresponding to pixels in interior outline 148 are denoted as {!„, I,, . 
• Im-i). and the n-number of cluster center vectors corresponding pixels in exterior 
20 outline 156 are denoted as {Oo, O,, . . ., 0„.,}. 

Pixels between the cluster center vectors Ij and Oj are classified by 
identifying the vector to which each pixel is closest in the five-dimensional vector 
space. For each pixel, the absolute distance dj and dj to each of respective cluster 
center vectors I, and Oj is computed according to the following equations: 
25 di=w^,„( I r-r. | + | g-g. | + | b-b^ | )+w^,,( | x-x^ | + | y-y, j ), 0^i<m, 
dj=w„,o.( I r-rj I + I g.gj + I b-b^ I )+w^„( | x-x^ | + | y-y^ | ), 0<j<n. 
in which w„,„ and w^,^ are weighting factors for the respective color and pixel 
position information. Weighting factors w„,„, and w,^,, are of values having a sum 
of 1 and otherwise selectable by a user. Preferably, weighting factors w,„,„ and 
Wcoord are of an equal value of 0.5. Each pixel is associated with object interior 144 
or exterior according to the minimum five-dimensional distance to one of the cluster 
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center vectors Ij and Oj. 

Function block 162 indicates that a user selects at least two, and preferable 

more (e.g. 4 to 6), feature points in each object of an initial or key frame. 

Preferably, the feature points are relatively distinctive aspects of the object. With 
5 reference to rectangular solid image feature 56, for example, comers 70a-70c could 

be selected as feature points. 

Function block 164 indicates that a block 166 of multiple pixels centered 

about each selected feature point (e.g., comers 70a- 70c) is defined and matched to a 

corresponding block in a subsequent image frame (e.g., the next successive image 
10 frame). Pixel block 166 is user defined, but preferably includes a 32 x 32 pixel 

array that includes only pixels within image interior 144. Any pixels 168 (indicated 

by cross-hatching) of pixel block 166 falling outside object interior 144 as 

determined by function block 158 (e.g., comers 70b and 70c) are omitted. Pixel 

blocks 166 are matched to the corresponding pixel blocks in the next image frame 
15 according to a minimum absolute error identified by a conventional block match 

process or a polygon match process, as described below in greater detail. 

Function block 170 indicates that a sparse motion transformation of an object 

is determined from the corresponding feature points in two successive image frames. 

Function block 172 indicates that mask 80 of the current image frame is transformed 
20 according to the sparse motion transformation to provide an estimation of the mask 

80 for the next image frame. Any feature point in a current frame not identified in 

a successive image frame is disregarded. 

Function block 1 74 indicates that the resulting estimation of mask 80 for the 

next image frame is delayed by one frame, and functions as an outline 1 76 for a 
25 next successive cycle. Similarly, function block 178 indicates that the corresponding 

feature points also are delayed by one frame, and utilized as the initial feature points 

180 for the next successive frame. 
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POLYGON MATCH METHOD 

Fig. 6 is a functional block diagram of a polygon match process 200 for 
determining a motion vector for each corresponding pair of pixels in successive 
image frames. Such a dense motion vector determination provides the basis for 
5 determining the dense motion transformations 96 of Fig. 3A. 

Polygon match process 200 is capable of determining extensive motion 
between successive image frames like the conventional block match process. In 
conu-ast to the conventional block match process, however, polygon match process 
200 maintains its accuracy for pixels located near or at an object perimeter and 
10 generates significantly less error. A preferred embodiment of polygon match method 
200 has improved computational efficiency. 

Polygon block method 200 is described with reference to Figs. 7A and 7B. 
which are simplified representations of display screen 50 showing two successive 
image frames 202a and 202b in which an image feature 204 is rendered as objects 
15 204a and 204b, respectively. 

Function block 206 indicates that objects 204a and 204b for image frames 
202a and 202b are identified and segmented by, for example, object segmentation 
method 140. 

Function block 208 indicates that dimensions are determined for a pixel block 
210b (e.g., 15x15 pixels) to be applied to object 204b and a search area 212 about 
object 204a. Pixel block 210b defines a region about each pixel in object 204b for 
which region a corresponding pixel block 2I0a is identified in object 204a. Search 
area 212 establishes a region within which corresponding pixel block 210a is sought. 
Preferably, pixel block 210b and search area 212 are right regular arrays of pixels 
25 and of sizes defined by the user. 

Function block 214 indicates that an initial pixel 216 in object 204b is 
identified and designated the current pixel. Initial pixel 216 may be defined by any 
of a variety of criteria such as, for example, the pixel at the location of greatest 
vertical extent and minimum horizontal extent. With the pixels on display screen 50 
arranged according to a coordinate axis 220 as shown, initial pixel 216 may be 
represented as the pixel of object 214b having a maximum y-coordinate value and a 
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minimum x-coordinate value. 

Function block 222 indicates that pixel block 210b is centered at and extends 
about the current pixel. 

Function block 224 represents an inquiry as to whether pixel block 2 1 Ob 
5 includes pixels that are not included in object 204b (e.g., pixels 226 shown by cross- 
hatching in Fig. 7B). This inquiry is made with reference to the objects identified 
according to function block 206. Whenever pixels within pixel block 210b 
positioned at the current pixel fall outside object 204b, function block 224 proceeds 
to function block 228 and otherwise proceeds to function block 232. 

10 Function block 228 indicates that pixels of pixel block 210b falling outside 

object 204b (e.g., pixels 226) are omitted from the region defined by pixel block 
210b so that it includes only pixels within object 204b. As a result, pixel block 
210b defines a region that typically would be of a polygonal shape more complex 
than the originally defined square or rectangular region. 

15 Function block 232 indicates that a pixel in object 204a is identified as 

corresponding to the current pixel in object 204b. The pixel in object 204a is 
referred to as the prior corresponding pixel. Preferably, the prior corresponding 
pixel is identified by forming a pixel block 210a about each pixel in search area 212 
and determining a correlation between the pixel block 210a and pixel block 210b 

20 about the current pixel in object 204b. Each correlation between pixel blocks 210a 
and 210b may be determined, for example, by an absolute error. The prior 
corresponding pixel is identified by identifying the pixel block 210a in search area 
212 for which the absolute error relative to pixel block 210b is minimized. A 
summed absolute error E for a pixel block 210a relative to pixel block 210b may be 

25 determined as: 

E =1 Z ( I r,-r,' I + I g,-g,' I + I b,-b,; I ), 

in which the terms r,j, g^, and bj^ correspond to the respective red, green, and blue 
30 color components associated with each of the pixels in pixel block 210b and the 
terms r,-', gj^', and b^j' correspond to the respective red, green, and blue color 
components associated with each of the pixels in pixel block 210a. 
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As set forth above, the summations for the absolute error E imply pixel 
blocks having pixel arrays having mxn pixel dimensions. Pixel blocks 2 1 Ob of 
polygonal configuration are accommodated relatively simply by, for example, 
defining zero values for the color components of all pixels outside polygonal pixel 
5 blocks 210b. 

Function block 234 indicates that a motion vector MV between each pixel in 
object 204b and the corresponding prior pixel in object 204a is determined. A 
motion vector is defined as the difference between the locations of the pixel in 
object 204b and the corresponding prior pixel in object 204a: 
10 MV= (vx,\ y^-y,'), 

in which the terms X; and y^ correspond to the respective x- and y-coordinaie 
positions of the pixel in pixel block 210b. and the terms x,' and y.'correspond to the 
respective x- and y-coordinate positions of the corresponding prior pixel in pixel 
block 210a. 

15 Function block 236 represents an inquiry as to whether object 204b includes 

any remaining pixels. Whenever object 204b includes remaining pixels, function 
block 236 proceeds to function block 238 and otherwise proceeds to end block 240. 

Function block 238 indicates that a next pixel in object 204b is identified 
according to a predetermined format or sequence. With the initial pixel selected as 

20 described above in reference to function block 214, subsequent pixels may be 

defined by first identifying the next adjacent pixel in a row (i.e., of a common y- 
coordinate value) and, if object 204 includes no other pixels in a row, proceeding to 
the first or left-most pixel (i.e., of minimum x-coordinate value) in a next lower 
row. The pixel so identified is designated the current pixel and function block 238 

25 returns to function block 222. 

Polygon block method 200 accurately identifies corresponding pixels even if 
they are located at or near an object f>erimeter. A significant source of error in 
conventional block matching processes is eliminated by omining or disregarding 
pixels of pixel blocks 210b falling outside object 204b. Conventional block 

30 matching processes rigidly apply a uniform pixel block configuration and are not 
applied with reference to a segmented object. The uniform block configurations 
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cause significant errors for pixels adjacent the perimeter of an object because the 
pixels outside the object can undergo significant changes as the object moves or its 
background changes. With such extraneous pixel variations included in conventional 
block matching processes, pixels in the vicinity of an object perimeter cannot be 
5 correlated accurately with the corresponding pixels in prior image frames. 

For each pixel in object 204b, a corresponding prior pixel in object 204a is 
identified by comparing pixel block 2 1 Ob with a pixel block 210a for each of the 
pixels in prior object 204a. The corresponding prior pixel is the pixel in object 204a 
having the pixel block 210a that best correlates to pixel block 210b. If processed in 
10 a conventional manner, such a determination can require substantial computation to 
identify each corresponding prior pixel. To illustrate, for pixel blocks having 
dimensions of nxn pixels, which are significantly smaller than a search area 212 
having dimensions of mxm pixels, approximately n^xm* calculations are required to 
identify each corresponding prior pixel in the prior object 204a, 

15 

PIXEL BLOCK CORRELATION PROCESS 

Fig. 8 is a functional block diagram of a modified pixel block correlation 
process 260 that preferably is substituted for the one described with reference to 
function block 232. Modified correlation process 260 utilizes redtmdancy inherent 

20 in correlating pixel blocks 210b and 210a to significantly reduce the number of 
calculations required. 

Correlation process 260 is described with reference to Figs. 9A-9G and lOA- 
lOG, which schematically represent arbitrary groups of pixels corresponding to 
successive image frames 202a and 202b. In particular. Fig. 9 A is a schematic 

25 representation of a pixel block 262 having dimensions of 5x5 pixels in which each 
letter corresponds to a different pixel. The pixels of pixel block 262 are arranged as 
a right regular array of pixels that includes distinct columns 264. Fig. 9B represents 
an array of pixels 266 having dimensions of qxq pixels and corresponding to a 
search area 212 in a prior image frame 202a. Each of the numerals in Fig. 9B 

30 represents a different pixel. Although described with reference to a conventional 
right regular pixel block 262, correlation process 260 is similarly applicable to 
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polygonal pixel blocks of the type described with reference to polygon match 
process 200. 

Function block 268 indicates that an initial pixel block (e.g., pixel block 262) 
is defined with respect to a central pixel M and scanned across a search area 212 
5 (e.g., pixel array 266) generally in a raster pattern (panly shown in Fig. 7A) as in a 
conventional block match process. Figs. 9C-9G schematically illustrate five of the 
approximately steps in the block matching process between pixel block 262 and 
pixel array 266. 

Although the scanning of pixel block 262 across pixel array 266 is performed 
10 in a conventional manner, computations relating to the correlation between them are 
performed differently according to this invention. In particular, a correlation (e.g., 
an absolute error) is determined and stored for each column 264 of pixel block 262 
in each scan position. The correlation that is determined and stored for each column 
264 of pixel block 262 in each scanned position is referred to as a column 

15 correlation 270, several of which are symbolically indicated in Figs. 9C-9G by 

referring to the correlated pixels. To illustrate. Fig. 9C shows a column correlation 
270(1 ) that is determined for the single column 264 of pixel block 262 aligned with 
pixel array 266. Similarly, Fig. 9D shows column correlations 270(2) and 270(3) 
that are determined for the two columns 264 of pixel block 262 aligned with pixel 

20 array 266. Figs. 9E-9G show similar column correlations with pixel block 262 at 
three exemplary subsequent scan positions relative to pixel array 266. 

The scanning of initial pixel block 262 over pixel array 266 provides a stored 
array or database of column correlations. With pixel block 262 having r-number of 
columns 264. and pixel array 266 having qxq pixels, the column correlation database 

25 includes approximately rq- number of column correlations. This number of column 
correlations is only approximate because pixel block 262 preferably is initially 
scanned across pixel array 266 such that pixel M is aligned with the first row of 
pixels in pixel array 266. 

The remaining steps beginning with the one indicated in Fig. 9C occur after 

30 two complete scans of pixel block 262 across pixel array 266 (i.e.. with pixel M 
aligned with the first and second rows of pixel array 266). 
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Function block 274 indicates that a next pixel block 276 (Fig. lOA) is 
defined from, for example, image frame 202b with respect to a central pixel N in the 
same row as pixel M. Pixel block 276 includes a column 278 of pixels not included 
in pixel block 262 and columns 280 of pixels included in pixel block 262. Pixel 
5 block 276 does not include a column 282 (Fig. 9A) that was included in pixel block 
262. Such an incremental definition of next pixel block 276 is substantially the 
same as that used in conventional block matching processes. 

Function block 284 indicates that pixel block 276 is scanned across pixel 
array 266 in the manner described above with reference to function block 268. As 
10 with Figs. 9C-9G, Figs. lOB-lOG represent the scanning of pixel block 276 across 
pixel array 266. 

Function block 286 indicates that for column 278 a column correlation is 
determined and stored at each scan position. Accordingly, column correlations 
288(1 )-288(5) are made with respect to the scanned positions of column 278 shown 

15 in respective Figs. lOB-lOF, 

Function block 290 indicates that for each of columns 280 in pixel block 276 
a stored column determination is retrieved for each scan position previously 
computed and stored in function block 268. For example, colunm correlation 270(1) 
of Fig. 9C is the same as column correlation 270'(1) of Fig. IOC. Similarly, column 

20 correlations 270'(2), 270^3), 270'(5).270'(8), and 270'(15)-270'(18) of Figs. lOD- 
lOF are the same as the corresponding column correlations in Figs. 9D, 9E, and 9G. 
For pixel block 276, therefore, only one column correlation 288 is calculated for 
each scan position. As a result, the number of calculations required for pixel block 
276 is reduced by nearly 80 percent. 

25 Function block 292 indicates that a subsequent pixel block 294 (Fig. 1 1 A) is 

defined with respect to a central pixel R in the next successive row relative to pixel 
M. Pixel block 294 includes columns 296 of pixels that are similar to but distinct 
from columns 264 of pixels in pixel block 262 of Fig. 9A, In particular, columns 
296 include pixels A*-E* not included in columns 264. Such an incremental 

30 definition of subsequent pixel block 294 is substantially the same as that used in 
conventional block matching processes. 



BNSOOCID: <WO 9713372A2_I_> 



wo 97/13372 _ _ PCT/US96/15892 

-26- 

Function block 298 indicates that pixel block 294 is scanned across pixel 
array 266 (Fig. 9B) in the manner described above with reference to fiinction blocks 
268 and 276. Figs. 1 lB-1 IF represent the scanning of pixel block 294 across pixel 
array 266. 

5 Function block 300 indicates that a column correlation is determined and 

stored for each of columns 296. Accordingly, column correlations 302(1)-302(18) 
are made with respect to the scanned positions of columns 296 shown in Figs. 1 IB- 
IIF. 

Each of column correlations 302(1)-302(18) may be calculated in an 
10 abbreviated manner with reference to column correlations made with respect to pixel 
block 262 (Fig. 9A). 

For example, column correlations 302(4 )-302(8) of Fig. 1 ID include 
subcolumn correlations 304'(4)-304'(8) that are the same as subcolumn correlations 
304(4)-304(8) of Fig. 9E. Accordingly, column correlations 302(4)-302(8) may be 

15 determined from respective column correlations 270(4)-270(8) by subtracting from 
the latter correlation values for pixels 01 A, 02B, 03C, 04D, and 05E to form 
subcolumn correlations 304(4)-304(8). respectively. Column correlations 302(4)- 
302(8) may be obtained by adding correlation values for die pixel pairs 56A', 57B\ 
58C\ 59D' and 50E' to the respective subcolumn correlation values 304(4)-304(8). 

20 respectively. 

The determination of column correlations 302(4)-302(8) from respective 
column correlations 270(4)-270(8) entails subtracting individual pixel correlation 
values corresponding to the row of pixels A-E of pixel block 262 not included in 
pixel block 294. and adding pixel correlation values for the row of pixels A'-E' 

25 included in pixel block 294 but not pixel block 262. This method substitutes for 

each of column correlations 302(4)-302(8), one substraction and one addition for the 
five additions that would be required to determine each column correlation in a 
conventional manner. With pixel blocks of larger dimensions as are preferred, the 
improvement of this method over conventional calculation methods is even greater. 

30 Conventional block matching processes identify only total block correlations for each 
scan position of initial pixel block 262 relative to pixel array 266. As a 
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consequence, all correlation values for all pixels must be calculated separately for 
each scan position. In contrast, correlation process 260 utilizes stored column 
correlations 270 to significantly reduce the number of calculations required. The 
improvements in speed and processor resource requirements provided by correlation 
5 process 260 more than offset the system requirements for storing the column 
correlations. 

It will be appreciated that correlation process 260 has been described with 
reference to Figs. 9-11 to illustrate specific features of this invention. As shown in 
the illustrations, this invention includes recurring or cyclic features that are 
10 particularly suited to execution by a computer system. These recurring or cyclic 

features are dependent upon the dimensions of pixel blocks and pixel arrays and are 
well understood and can be implemented by persons skilled in the art. 

MULTI-DIMENSIONAL TRANSFORMATION 

15 Fig. 12 is a functional block diagram of a transformation method 350 that 

includes generating a multi-dimensional transformation between objects in first and 
second successive image frames and quantitizing the mapping for transmission or 
storage. The multi-dimensional transformation preferably is utilized in connection 
with function block 96 of Fig. 3. Transformation method 350 is described with 

20 reference to Fig. 7A and Fig. 13, the latter of which like Fig. 7B is a simplified 
representation of display screen 50 showing image frame 202b in which image 
feature 204 is rendered as object 204b. 

Transformation method 350 preferably provides a multi-dimensional affine 
transformation capable of representing complex motion that includes any or all of 

25 translation, rotation, magnification, and shear. Transformation method 350 provides 
a significant improvement over conventional video compression methods such a 
MPEG-1, MPEG-2, and H.26X, which are of only one dimension and represent only 
translation. In this regard, the dimensionality of a transformation refers to the 
number of coordinates in the generalized form of the transformation, as described 

30 below in greater detail. Increasing the accuracy with which complex motion is 

represented according to this invention results in fewer errors than by conventional 
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representations, thereby increasing compression efficiency. 

Function block 352 indicates that a dense motion estimation of the pixels in 
objects 204a and 204b is determined. Preferably, the dense motion estimation is 
obtained by polygon match process 200. As described above, the dense motion 
5 estimation includes motion vectors between pixels at coordinates (x„ y^) in object 
204b of image frame 202b and corresponding pixels at locations (x;, y^) of object 
204a in image frame 202a. 

Function block 354 indicates that an array of ttansformation blocks 356 is 
defined to encompass object 204b. Preferably, transformation blocks 356 are right 
10 regular arrays of pixels having dimensions of, for example, 32x32 pixels. 

Function block 358 indicates that a multi-dimensional affme transformation is 
generated for each transformation block 356. Preferably, the affme transformations 
are of first order and represented as: 
x,'=axi+byi+c 
15 y_'=dXi+eyi+f, 

and are determined with reference to all pixels for which the motion vectors have a 
relatively high confidence. These affine transformations are of two dimensions in 
that Xj and y, are defined relative to two coordinates: Xj and y^. 

The relative confidence of the motion vectors refers to the accuracy with 
20 which the motion vector between corresponding pixels can be determined uniquely 
relative to other pixels. For example, motion vectors between particular pixels that 
are in relatively large pixel arrays and are uniformly colored (e.g., black) cannot 
typically be determined accurately. In particular, for a black pixel in a first image 
fi-ame, many pixels in the pixel array of the subsequent image fi-ame will have the 
25 same correlation (i.e., absolute value error between pixel blocks). 

In contrast, pixel arrays in which pixels correspond to distinguishing features 
typically will have relatively high correlations for particular corresponding pixels in 
successive image frames. 

The relatively high correlations are preferably represented as a minimal 
30 absolute value error determination for particular pixel. Motion vectors of relatively 
high confidence may, therefore, be determined relative to such uniquely low error 
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15 



values. For example, a high confidence motion vector may be defined as one in 
which the minimum absolute value error for the motion vector is less than the next 
greater error value associated with the pixel by a difference amount that is greater 
than a threshold difference amount. Alternatively, high confidence motion vectors 
may be defined with respect to the second order derivative of the absolute error 
values upon which the correlations are determined. A second order derivative of 
more than a particular value would indicate a relatively high correlation between 
specific corresponding pixels. 

With n-number of pixels with such high-confidence motion vectors, the 
preferred affine transformation equations are solved with reference to n-number of 
corresponding pixels in image frames 202a and 202b. Images frames must include 
at least three corresponding pixels in image frames 202a and 202b with high 
confidence motion vectors to solve for the six unknown coefficients a, b, c, d, e, and 
f of the preferred affine transformation equations. With the preferred dimensions, 
each of transformation blocks 356 includes 2**^ pixels of which significant numbers 
typically have relatively high confidence motion vectors. Accordingly, the affine 
transformation equations are over-determined in that a significantly greater number 
of pixels are available to solve for the coefficients a, b, c, d, e, and f 

The resulting n-number of equations may be represented by the linear 
algebraic expression: 



BNSDOOD: <WO 9713372A2J_> 



wo 97/13372 



-30- 



PCT/US96/15892 



-1 Y-i 1 

n n 



a 
b 

c 



X' 



N-l 



Y„-l 1 



d 
e 

I/. 



Preferably these equations are solved by a conventional singular value 
decomposition (SVD) method, which provides a minimal least-square error for the 
approximation of the dense motion vectors. A conventional SVD method is 
described, for example, in Numerical Recipes in C . by Press et al., Cambridge 

10 University Press, (1992). 

As described above, the preferred two-dimensional affine transformation 
equations are capable of representing translation, rotation, magnification, and shear 
of transformation blocks 356 between successive image frames 202a and 202b. In 
contrast, conventional motion transformation methods used in prior compression 

15 standards employ simplified transformation equations of the form: 
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The prior simplified transformation equations represent motion by only two 
coefficients, g and h, which represents only one-third the amount of information 
5 (i.e., coefficients) obtained by the preferred multi-dimensional transformation 
equations. To obtain superior compression of the information obtained by 
transformation method 350 relative to conventional compression methods, the 
dimensions of transformation block 356 preferably are more than three times larger 
than the corresponding 16x16 pixel blocks employed in MPEG-1 and MPEG-2 
10 compression methods. The preferred 32x32 pixel dimensions of transformation 
blocks 356 encompass four times the number of pixels employed in the 
transformation blocks of conventional transformation methods. The larger 
dimensions of transformation blocks 356, together with the improved accuracy with 
which the affine transformation coefficients represent motion of the transformation 
15 blocks 356, allow transformation method 350 to provide greater compression than 
conventional compression methods. 

It will be appreciated that the affine coefficients generated according to the 
present invention typically would be non-integer, floating point values that could be 
difficult to compress adequately without adversely affecting their accuracy. 
20 Accordingly, it is preferable to quantize the affine transformation coefficient to 
reduce the bandwidth required to store or transmit them. 

Function block 362 indicates that the affine transformation coefficients 
generated with reference to function block 358 are quantized to reduce the 
bandwidth required to store or transmit them. Fig. 14 is an enlarged fi-agmentary 
25 representation of a transformation block 356 showing three selected pixels, 364a, 
364b, and 364c from which the six preferred affine transformation coefficients a-f 
may be determined. 

Pixels 364a-364c are represented as pixel coordinates (x,, y,), (x,, y,), and 
(^3> yj)^ respectively. Based upon the dense motion estimation of function block 
30 352, pixels 364a-364c have respective corresponding pixels (x/, y,'), (y^, y2). (X3', 
y3') in preceding image frame 202a. As is conventional, pixel locations (x„ y,) are 
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represented by integer values and are solutions to the affme transformation equations 
upon which the preferred affme transformation coefficients are based. Accordingly, 
selected pixels 364a-364c are used to calculate the corresponding pixels from the 
preceding image frame 202a, which typically will be floating point values. 
5 Quantization of these floating point values is performed by converting to 

integer format the difference between corresponding pixels {x,-x„ yrV',). The affme 
transformation coefficients are determined by first calculating the pixel values (x'„ 
y'i) from the difference vectors and the pixel values (x,, y,). and then solving the 
multi-dimensional transformation equations of function block 358 with respect to the 
10 pixel values (x'j, y'j). 

As shown in Fig. 14, pixels 364a-364c preferably are distributed about 
transformation block 356 to minimize the sensitivity of the quantization to local 
variations within transformation block 356. Preferably, pixel 364a is positioned at 
or adjacent the center of transformation block 356, and pixels 364b and 364c are 

15 positioned at upper comers. Also in the preferred embodiment, the selected pixels 
for each of the transformation blocks 356 in object 204b have the same positions, 
thereby allov^ing the quantization process to be performed efficiently. 

Another aspect of the quantization method of function block 362 is that 
different levels of quantization may be used to represent varying degrees of motion. 

20 As a result, relatively simple motion (e.g.. translation) may be represented by fewer 
selected pixels 364 than are required to represent complex motion. With respect to 
the affine transformation equations described above, pixel 364a (x,, y,) from object 
204b and the corresponding pixel (x,', y,') from object 204a are sufficient to solve 
simplified affine transformation equations of the form: 

25 x,'=y,+c 

y. =y,+f, 

which represent translation between successive image frames. Pixel 364a 
specifically is used because its cenu-al position generally represents translational 
motion independent of the other types of motion. Accordingly, a user may 
30 selectively represent simplified motion such as translation with simplified affine 
transformation equations that require one-third the data required to represent 
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complex motion. 

Similarly, a pair of selected pixels (x,, y,) (e.g., pixel 364a) and (Xj, Ji) (i.e., 
either of pixels 364b and 364c) from object 204b and the corresponding pixels (x,', 
y,') and (Xj', y^) from object 204a are sufficient to solve simplified affine 
5 transformation equations of the form: 

x,'=aXi-+-c 
y;=eyi+f, 

which are capable of representing motions that include translation and magnification 
between successive image frames. In the simplified form: 
1 0 x'=acosGx+sin9y+c 
y ' =-sin0x-i-acos6y-Hf 

the corresponding pairs of selected pixels are capable of representing motions that 
include translation, rotation* and isotropic magnification. In this simplified form, the 
common coefficients of the x and y variables allow the equations to be solved by 

15 two corresponding pairs of pixels. 

Accordingly, a user may selectively represent moderately complex motion 
that includes translation, rotation, and magnification with partly simplified affine 
transformation equations. Such equations would require two-thirds the data required 
to represent complex motion. Adding the third selected pixel (X3, y^) from object 

20 204b, the corresponding pixel (X3', y3') from object 204a, and the complete preferred 
affine transformation equations allows a user also to represent shear between 
successive image frames. 

A preferred embodiment of transformation method 350 (Fig. 12) is described 
as using uniform transformation blocks 356 having dimensions of, for example, 

25 32x32 pixels. The preferred multi-dimensional affine transformations described with 
reference to fianction block 358 are determined with reference to transformation 
blocks 356. It will be appreciated that the dimensions of transformation blocks 356 
directly affect the compression ratio provided by this method. 

Fewer transformation blocks 356 of relatively large dimensions are required 

30 to represent transformations of an object between image frames than the number of 
transformation blocks 356 having smaller dimensions. A consequence of uniformly 
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large transformation blocks 356 is that correspondingly greater error can be 
introduced for each transformation block. Accordingly, uniformly sized 
transformation blocks 356 typically have moderate dimensions to balance these 
conflicting performance constraints. 

5 

TRANSFORMATION BLOCK OPTIMIZATION 

Fig. 15 is a functional block diagram of a transformation block optimization 
method 370 that automatically selects transformation block dimensions that provide a 
minimal error threshold. Optimization method 370 is described with reference to 
10 Fig. 16, which is a simplified representation of display screen 50 showing a portion 
of image frame 202b with object 204b. 

Function block 372 indicates that an initial transformation block 374 is 
defined with respect to object 204b. Initial U-ansformation block 374 preferably is of 
maximal dimensions that are selectable by a user and are, for example, 64x64 pi.xels. 
15 Initial transformation block 374 is designated the current transformation block. 

Function block 376 indicates that a current peak signal-to-noise ratio (SNR) 
is calculated with respect to the current transformation block. The signal-to-noise 
ratio preferably is calculated as the ratio of the variance of the color component 
values of the pixel within the current transformation block (i.e., the signal) to the 
20 variance of the color components values of the pixels associated with estimated error 
1 10 (Fig. 3). 

Function block 378 indicates that the current transformation block (e.g., 
transformation block 374) is subdivided into, for example, four equal sub-blocks 
380a-380d, affine transformations are determined for each of sub- blocks 380a-380d. 
25 and a future signal-to-noise ratio is determined with respect to the affine 

transformations. The future signal-to-noise ratio is calculated in substantially the 
same manner as the current signal-to-noise ratio described with reference to function 
block 376. 

Inquiry block 382 represents an inquiry as to whether the future signal-to- 
30 noise ratio is greater than the current signal-to-noise ratio by more than a user- 
selected threshold amount. This inquiry represents a determination that further 
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subdivision of the current transformation block (e.g., transformation block 374) 
would improve the accuracy of the affme transformations by at least the threshold 
amount. Whenever the future signai-to-noise ratio is greater than the current signal- 
to-noise ratio by more than the threshold amount, inquiry block 382 proceeds to 
5 function block 384, and otherwise proceeds to function block 388. 

Function block 384 indicates that sub-blocks 380a-380d are successively 
designated the current transformation block, and each are analyzed whether to be 
further subdivided. For purposes of illustration, sub-block 380a is designated the 
current transformation and processed according to function block 376 and further 
10 sub-divided into sub-blocks 386a-386d. Function block 388 indicates that a next 

successive transformation block 374' is identified and designated an initial or current 
transformation block. 

PRECOMPRESSION EXTRAPOLATION METHOD 

15 Figs. 17A and B are a functional block diagram of a precompression 

extrapolation method 400 for extrapolating image features of arbitrary configuration 
to a predefined configuration to facilitate compression in accordance with function 
block 112 of encoder process 64 (both of Fig. 3). Extrapolation method 400 allows 
the compression of function block 1 12 to be performed in a conventional manner 
JO such as DCT or lattice or other wavelet compression, as described above. 

Conventional still image compression methods such as lattice or other 
wavelet compression or discrete cosine transforms (DCT) operate upon rectangular 
arrays of pixels. As described above, however, the methods of the present invention 
are applicable to image features or objects of arbitrary configuration. Extrapolating 
:5 such objects or image features to a rectangular pixel array configuration allows use 
of conventional still image compression methods such as lattice or other wavelet 
compression or DCT. Extrapolation method 400 is described below with reference 
to Figs. 18A-18D, which are representations of display screen 50 on which a simple 
object 402 is rendered to show various aspects of extrapolation method 400. 
0 Function block 404 indicates that an extrapolation block boundary 406 is 

defined about object 402. Extrapolation block boundary 406 preferably is 
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rectangular. Referring to Fig. 18 A, the formation of extrapolation block boundary 
406 about object 402 is based upon an identification of a perimeter 408 of object 
402 by, for example, object segmentation method 140 (Fig. 4). Extrapolation block 
boundary 406 is shown encompassing object 402 in its entirety for purposes of 
5 illustration. It will be appreciated that extrapolation block boundary 406 could 

alternatively encompass only a portion of object 402. As described with reference to 
object segmentation method 140, pixels included in object 402 have color component 
values that differ from those of pixels not included in object 402. 

Function block 410 indicates that all pixels 412 bounded by exu-apolation 
10 block boundary 406 and not included in object 402 are assigned a predefined value 
such as, for example, a zero value for each of the color components. 

Function block 414 indicates that horizontal lines of pixels within 
extrapolation block boundary 406 are scanned to identify horizontal lines with 
horizontal pixel segments having both zero and non-zero color component values. 
15 Function block 416 represents an inquiry as to whether the horizontal pixel 

segments having color component values of zero are bounded at both ends by 
perimeter 408 of object 402. Referring to Fig. 18B, region 418 represents horizontal 
pixel segments having color component values of zero that are bounded at both ends 
by perimeter 408. Regions 420 represent horizontal pixel segments that have color 
20 component values of zero and are bounded at only one end by perimeter 408. 

Function block 416 proceeds to function block 426 for regions 418 in which the 
pixel segments have color component values of zero bounded at both ends by 
perimeter 408 of object 402, and otherwise proceeds to function block 422. 

Function block 422 indicates that the pixels in each horizontal pixel segment 
25 of a region 420 is assigned the color component values of a pixel 424 (only 

exemplary ones shown) in the corresponding horizontal lines and perimeter 408 of 
object 402. Alternatively, the color component Values assigned to the pixels in 
regions 420 are functionally related to the color component values of pixels 424. 

Function block 426 indicates that the pixels in each horizontal pixel segment 
30 in region 418 are assigned color component values corresponding to, and preferably 
equal to, an average of the color component values of pixels 428a and 428b that are 
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in the corresponding horizontal lines and on perimeter 408. 

Function block 430 indicates that vertical lines of pixels within extrapolation 
block boundary 406 are scanned to identify vertical lines with vertical pixel 
segments having both zero and non-zero color component values. 

Function block 432 represents an inquiry as to whether the vertical pixel 
segments in vertical lines having color component values of zero are bounded at 
both ends by perimeter 408 of object 402. Referring to Fig. 18C, region 434 
represents vertical pixel segments having color component values of zero that are 
bounded at both ends by perimeter 408. Regions 436 represent vertical pixel 
segments that have color component values of zero and are bounded at only one end 
by perimeter 408. Function block 432 proceeds to function block 444 for region 
434 in which the vertical pixel segmems have color component values of zero 
bounded at both ends by perimeter 408 of object 402, and otherwise proceeds to 
function block 438. 

Function block 438 indicates that the pixels in each vertical pixel segment of 
region 436 are assigned the color component values of pixels 442 (only exemplary 
ones shown) in the vertical lines and perimeter 408 of object 402. Alternatively, the 
color component values assigned to the pixels in region 436 are functionally related 
to the color component values of pixels 442. 

Function block 444 indicates that the pixels in each vertical pixel segment in 
region 434 are assigned color component values corresponding to, and preferably 
equal to, an average of the color component values of pixels 446a and 446b that are 
in the horizontal lines and on perimeter 408. 

Function block 448 indicates that pixels that are in both horizontal and 
vertical pixel segments that are assigned color component values according to this 
method are assigned composite color component values that relate to, and preferably 
are the average of, the color component values otherwise assigned to the pixels 
according to their horizontal and vertical pixel segments. 

Examples of pixels assigned such composite color component values are 
30 those pixels in regions 4 1 8 and 434. 

Function block 450 indicates that regions 452 of pixels bounded by 
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extrapolation block boundary 406 and not intersecting perimeter 408 of object 402 
along a horizontal or vertical line are assigned composite color component values 
that are related to, and preferably equal to the average of, the color component 
values assigned to adjacent pixels. Referring to Fig. 18D, each of pixels 454 in 
5 regions 452 is assigned a color component value that preferably is the average of the 
color component values of pixels 456a and 456b that are aligned with pixel 454 
along respective horizontal and vertical lines and have non-zero color component 
values previously assigned by this method. 

A benefit of object extrapolation process 400 is that is assigns smoothly 

10 varying color component values to pixels not included in object 402 and therefore 
optimizes the compression capabilities and accuracy of conventional still image 
compression methods. In contrast, prior art zero padding or mirror image methods, 
as described by Chang et ai.. "Transform Coding of Arbitrarily-Shaped Image 
Segments," ACM Multimedia, pp. 83-88, June, 1993, apply compression to 

15 extrapolated objects that are filled with pixels having zero color components values 
such as those applied in function block 410. The drastic image change than occurs 
between an object and the zero-padded regions introduces high frequency changes 
that are difficult to compress or introduce image artifacts upon compression. Object 
extrapjolation method 400 overcomes such disadvantages. 

20 

ALTERNATIVE ENCODER METHOD 

Fig. 19A is a ftmctional block diagram of an encoder method 500 that 
employs a Laplacian pyramid encoder with unique filters that maintain nonlinear 
aspects of image features, such as edges, while also providing high compression. 

25 Conventional Laplacian pyramid encoders are described, for example, in the 

Laplacian Pyramid as a Compact Image Code by Burt and Addleson, IEEE Trans. 
Comm., Vol. 31, No. 4, pp. 532-540, April 1983. Encoder method 500 is capable 
of providing the encoding described with reference to function block 112 of video 
compression encoder process 64 shouoi in Fig. 3, as well as whenever else DCT on 

30 wavelet encoding is suggested or used. By way of example, encoder method 500 is 
described with reference to encoding of estimated error 110 (Fig. 3). 



wo 97/13372 PCT/US96/15892 

-39- 



A first decimation filter 502 receives pixel information corresponding to an 
estimated error 110 (Fig. 3) and filters the pixels according to a filter criterion. In a 
conventional Laplacian pyramid method, the decimation filter is a low-pass filter 
such as a Gaussian weighting function. In accordance with encoder method 500, 
5 however, decimation filter 502 preferably employs a median filter and, more 
specifically, a 3x3 nonseparable median filter. 

To illustrate, Fig. 20A is a simplified representation of the color component 
values for one color component (e.g., red) for an arbitrary set or array of pixels 504. 
Although described with particular reference to red color component values, this 
10 illustration is similarly applied to the green and blue color component values of 
pixels 504. 

With reference to the preferred embodiment of decimation filter 502, filter 
blocks 506 having dimensions of 3x3 pixels are defined among pixels 504. For each 
pixel block 506, the median pixel intensity value is identified or selected. With 
15 reference to pixel blocks 506a-506c, for example, decimation filter 502 provides the 
respective values of 8, 9, and 10, which are listed as the first three pixels 512 in Fig. 
20B. 

It will be appreciated, however, that decimation filter 502 could employ other 
median filters according to this invention. Accordingly, for each group of pixels 
20 having associated color component values of {a^, a,, . . a„,,} the median filter 
would select a median value a^^. 

A first 2x2 down sampling filter 514 samples alternate pixels 512 in vertical 
and horizontal directions to provide additional compression. Fig. 20C represents a 
resulting compressed set of pixels 515. 
25 A 2x2 up sample filter 516 inserts a pixel of zero value in place of each 

pixel 512 omitted by down sampling filter 514, and interpolation filter 518 assigns 
to the zero-value pixel a pixel value of an average of the opposed adjacent pixels, or 
a previous assigned value if the zero-value pixel is not between an opposed pair of 
non-zero value pixels. To illustrate. Fig. 20D represents a resulting set or array of 
30 value pixels 520. 

A difference 522 is taken between the color component values of the set of 
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pixels 504 and the corresponding color component values for set of pixels 520 to 
form a zero-order image component Iq. 

A second decimation filter 526 receives color component values 
corresponding to the compressed set of pixels 5 1 5 generated by first 2x2 down 
5 sampling filter 514, Decimation filter 526 preferably is the same as decimation filter 
502 (e.g., a 3x3 nonseparable median filter). Accordingly, decimation filter 526 
functions in the same manner as decimation filter 502 and delivers a resulting 
compressed set or array of pixels (not shown) to a second 2x2 down sampling filter 
528. 

0 Down sampling filter 528 ftmctions in the same manner as down sampling 

filter 514 and forms a second order image component that also is delivered to a 
2x2 up sample filter 530 and an interpolation filter 531 that function in the same 
maimer as up sample filter 516 and interpolation filter 518, respectively. A 
difference 532 is taken between the color component values of the set of pixels 515 

5 and the resulting color component values provided by interpolation filter 531 to form 
a first-order image component I,. 

The image components Iq, I,, and are respective 

n n n n 
nxn, -JC-, — X- 

2 2 4 4 



sets of color component values that represent the color component values for 
20 an nxn array of pixels 504. 

Image component lo maintains the high frequency components (e.g., edges) of 
an image represented by the original set of pixel 504. Image components I, and L2 
represent low frequency aspects of the original image. Image components Iq, I, and 
L2 provide relative compression of the original image. Image component Iq and I, 
25 maintain high frequency features (e.g., edges) in a format that is highly compressible 
due to the relatively high correlation between the values of adjacent pixels. Image 
component is not readily compressible because it includes primarily low 
frequency image features, but is a set of relatively small size. 

Fig. 19B is a functional block diagram of a decoder method 536 that decodes 
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or inverse encodes image components Iq, I,, and L2 generated by encoder method 
500. Decoder method 536 includes a first 2x2 up sample filter 538 that receives 
image component Lj and interposes a pixel of zero value between each adjacent pair 
of pixels. An interpolation filter 539 assigns to the zero-value pixel a pixel value 
5 that preferably is an average of the values of the adjacent pixels, or a previous 
assigned value if the zero-value pixel is not between an opposed pair of non-zero- 
value pixels. First 2x2 up sample filter 538 operates in substantially the same 
manner as up sample filters 516 and 530 of Fig. 19A, and interpolation filter 539 
operates in substantially the same maimer as interpolation filters 518 and 531. 

10 A sum 540 is determined between image component I, and the color 

component values corresponding to the decompressed set of pixels generated by first 
2x2 up sample filter 538 and interpolation filter 539. A second 2x2 up sample filter 
542 interposes a pixel of zero value between each adjacent pair of pixels generated 
by sum 540. An interpolation filter 543 assigns to the zero-value pixel a pixel value 

15 that includes an average of the values of the adjacent pixels, or a previous assigned 
value if the zero- value pixel is not between an opposed pair of non-zero- value 
pixels. Up sample filter 542 and interpolation filter 543 are substantially the same 
as up sample filter 538 and interpolation filter 539» respectively. 

A sum 544 sums the image component Iq with the color component values 

20 corresponding to the decompressed set of pixels generated by second 2x2 up sample 
filter 542 and interpolation filter 543. Sum 544 provides decompressed estimated 
error 110 corresponding to the estimated error 110 delivered to encoder process 500. 

TRANSFORM CODING OF MOTION VECTORS 

25 Conventional video compression encoder processes, such as MPEG-1 or 

MPEG-2, utilize only sparse motion vector fields to represent the motion of 
significantly larger pixel arrays of a regular size and configuration. The motion 
vector fields are sparse in that only one motion vector is used to represent the 
motion of a pixel array having dimensions of, for example, 16x16 pixels. The 

30 sparse motion vector fields, together with transform encoding of underlying images 
or pixels by, for example, discrete cosine transform (DCT) encoding, provide 
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conventional video compression encoding. 

In contrast, video compression encoding process 64 (Fig. 3) utilizes dense 
motion vector fields in which motion vectors are determined for all, or virtually all, 
pixels of an object. Such dense motion vector fields significantly improve the 
accuracy with which motion between corresponding pixels is represented. Although 
the increased accuracy can significantly reduce the errors associated with 
conventional sparse motion vector field representations, the additional information 
included in dense motion vector fields represent an increase in the amount of 
information representing a video sequence. In accordance with this invention, 
therefore, dense motion vector fields are themselves compressed or encoded to 
improve the compression ratio provided by this invention. 

Fig. 2 1 is a functional block diagram of a motion vector encoding process 
560 for encoding or compressing motion vector fields and, preferably, dense motion 
vector fields such as those generated in accordance with dense motion transformation 
15 96 of Fig. 3. It will be appreciated that such dense motion vector fields fi-om a 
selected object typically will have greater continuity or "smoothness" than the 
underlying pixels corresponding to the object. As a result, compression or encoding 
of the dense motion vector fields will attain a greater compression ratio than would 
compression or encoding of the underlying pixels. 
20 Function block 562 indicates that a dense motion vector field is obtained for 

an object or a portion of an object in accordance with, for example, the processes of 
ftinction block 96 described with reference to Fig. 3. Accordingly, the dense motion 
vector field will correspond to an object or other image portion of arbitrary 
configuration or size. 

Function block 564 indicates that the configuration of the dense motion 
vector field is extrapolated to a regular, preferably rectangular, configuration to 
facilitate encoding or compression. Preferably, the dense motion vector field 
configuration is extrapolated to a regular configuration by precompression 
extrapolation method 400 described with reference to Figs. 17A and 17B. It will be 
appreciated that conventional extrapolation methods, such as a mirror image method, 
could alternatively be utilized. 
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Function block 566 indicates that the dense motion vector field with its 
extrapolated regular configuration is encoded or compressed according to 
conventional encoding transformations such as, for example, discrete cosine 
transformation (DCT) or lattice or other wavelet compression, the former of which is 
5 preferred. 

Function block 568 indicates that the encoded dense motion vector field is 
further compressed or encoded by a conventional lossless still image compression 
method such as entropy encoding to form an encoded dense motion vector field 570. 
Such a still image compression method is described with reference to function block 
10 114 of Fig. 3. 

COMPRESSION OF QUANTIZED OBJECTS FROM PREVIOUS 
VIDEO FRAMES 

Referring to Fig. 3 A, video compression encoder process 64 uses quantized 
15 prior object 126 determined with reference to a prior frame N-1 to encode a 

corresponding object in a next successive frame N. As a consequence, encoder 
process 64 requires that quantized prior object 126 be stored in an accessible 
memory buffer. With conventional video display resolutions, such a memory buffer 
would require a capacity of at least one-half megabyte to store the quantized prior 
20 object 126 for a single video frame. Higher resolution display formats would 
require correspondingly larger memory buffers. 

Fig. 22 is a functional block diagram of a quantized object encoder-decoder 
(codec) process 600 that compresses and selectively decompresses quantized prior 
objects 126 to reduce the required capacity of a quantized object memory buffer. 
25 Function block 602 indicates that each quantized object 126 in an image 

frame is encoded on a block-by-block manner by a lossy encoding or compression 
method such as discrete cosine transform (DCT) encoding or lattice sub-band or 
other wavelet compression. As shown in Fig. 21, lossy encoded information can 
undergo additional lossless encoding. Alternatively, lossless encoding alone can be 
30 used. 

Function block 604 indicates that the encoded or compressed quantized 
objects are stored in a memory buffer (not shown). 
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Function block 606 indicates that encoded quantized objects are retrieved 
from the memory buffer in anticipation of processing a corresponding object in a 
next successive video frame. 

Function block 608 indicates that the encoded quantized object is inverse 
encoded by, for example, DCT or wavelet decoding according to the encoding 
processes employed with respect to function block 602. 

Codec process 600 allows the capacity of the corresponding memory buffer 
to be reduced by up to about 80%, depending upon the overall video compression 
ratio and the desired quality of the resultant video. Moreover, it will be appreciated 
that codec process 600 is similarly applicable to the decoder process corresponding 
to video compression encoder process 64. 



VIDEO COMPRESSION DECODER PROCESS OVERVIEW 

Video compression encoder process 64 of Fig. 3 provides encoded or 
15 compressed representations of video signals corresponding to video sequences of 
multiple image frames. The compressed representations include object masks 66, 
feature points 68, affine transform coefficients 104, and compressed error data 116 
from encoder process 64 and compressed master objects 136 from encoder process 
130. These compressed representations facilitate storage or transmission of video 
20 information, and are capable of achieving compression ratios of up to 300 percent 
greater than those achievable by conventional video compression methods such as 
MPEG-2. 

It will be appreciated, however, that retrieving such compressed video 
information from data storage or receiving transmission of the video information 
25 requires that it be decoded or decompressed to reconstruct the original video signal 
so that it can be rendered by a display device such as video display device 52 (Figs. 
2A and 2B). As with conventional encoding processes such as MPEG-1, MPEG-2, 
and H.26X, the decompression or decoding of the video information is substantially 
the inverse of the process by which the original video signal is encoded or 
30 compressed. 

Fig. 23A is a functional block diagram of a video compression decoder 
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process 700 for decompressing video information generated by video compression 
encoder process 64 of Fig. 3. For purposes of consistency with the description of 
encoder process 64, decoder process 700 is described with reference to Figs. 2A and 
2B. Decoder process 700 retrieves from memory or receives as a transmission 
5 encoded video information that includes object masks 66, feature points 68, 

compressed master objects 136, affme transform coefficients 104, and compressed 
error data 116. 

Decoder process 700 performs operations that are the inverse of those of 
encoder process 64 (Fig. 3). Accordingly, each of the above-described preferred 
10 operations of encoder process 64 having a decoding counterpart would similarly be 
inversed. 

Function block 702 indicates that masks 66, feature points 68, transform 
coefficients 104, and compressed error data 116 are retrieved from memory or 
received as a transmission for processing by decoder process 700. 

15 Fig. 23B is a functional block diagram of a master object decoder process 

704 for decoding or decompressing compressed master object 136. Function block 
706 indicates that compressed master object data 136 are entropy decoded by the 
inverse of the conventional lossless entropy encoding method in function block 1 34 
of Fig. 3B. Function block 708 indicates that the entropy decoded master object 

20 from function block 706 is decoded according to an inverse of the conventional 
lossy wavelet encoding process used in function block 132 of Fig. 33. 

Function block 712 indicates that dense motion transformations, preferably 
multi-dimensional affine transformations, are generated from affine coefficients 104, 
Preferably, affine coefficients 104 are quantized in accordance with transformation 

25 method 350 (Fig. 12), and the affine transformations are generated from the 

quantized affine coefficients by performing the inverse of the operations described 
v^dth reference to function block 362 (Fig. 12). 

Function block 714 indicates that a quantized form of an object 716 in a 
prior frame N-1 (e.g., rectangular solid object 56a in image frame 54a) provided via 

30 a timing delay 718 is transformed by the dense motion transformation to provide a 
predicted form of the object 720 in a current frame N (e.g., rectangular solid object 
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56b in image frame 54b). 

Function block 722 indicates that for image frame N, predicted current object 
720 is added to a quantized error 724 generated from compressed error data 1 16. In 
particular, function block 726 indicates that compressed error data 116 is decoded by 
an inverse process to that of compression process 114 (Fig. 3 A). In the preferred 
embodiment, function blocks 114 and 726 are based upon a conventional lossless 
still image compression method such as entropy encoding. 

Function block 728 indicates that the entropy decoded error data from 
function block 726 is further decompressed or decoded by a conventional lossy still 
image compression method corresponding to that utilized in function block 1 12 (Fig. 
3A). In the preferred embodiment, the decompression or decoding of nmction block 
728 is by a lattice subband or other wavelet process or a discrete cosine transform 
(DCT) process. 

Function block 722 provides quantized object 730 for frame N as the sum of 
15 predicted object 720 and quantized error 724, representing a reconstructed or 

decompressed object 732 that is delivered to function block 718 for reconstruction of 
the object in subsequent frames. 

Function block 734 indicates that quantized object 732 is assembled with 
other objects of a current image frame N to form a decompressed video signal. 

20 

SIMPLIFIED CHAIN ENCODING 

Masks, objects, sprites, and other graphical features, commonly are 
represented by their contours. As shov^m in and explained with reference to FIG. 
5A, for example, rectangular solid object 56a is bounded by an object perimeter or 

25 contour 142. A conventional process of encoding or compressing contours is 
referred to as chain encoding. 

FIG. 24A shows a conventional eight-point chain code 800 from which 
contours on a conventional recta-linear pixel array are defined. Based upon a 
current pixel location X, a next successive pixel location in the contour extends in 

30 one of directions 802a-802h. The chain code value for the next successive pixel is 
the numeric value corresponding to the particular direction 802. As examples, the 
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right, horizontal direction 802a corresponds to the chain code value O, and the 
downward, vertical direction 802g corresponds to the chain code value 6. Any 
continuous contour can be described from eight-point chain code 800. 

With reference to FIG. 24B, a contour 804 represented by pixels 806 
5 designated X and A-G can be encoded in a conventional manner by the chain code 
sequence {00764432}. In particular, beginning from pixel X, pixels A and B are 
positioned in direction 0 relative to respective pixels X and A. Pixel C is positioned 
in direction 7 relative to pixel B. Remaining pixels D-G are similarly positioned in 
directions corresponding to the chain code values listed above. In a binary 
10 representation, each conventional chain code value is represented by three digital 
bits. 

FIG. 25 A is a functional block diagram of a chain code process 810 of the 
present invention capable of providing contour compression ratios at least about 
twice those of conventional chain code processes. Chain code process 810 achieves 
15 such improved compression ratios by limiting the number of chain codes and 
defining them relative to the alignment of adjacent pairs of pixels. Based upon 
experimentation, it has been discovered that the limited chain codes of chain code 
process 810 directly represent more than 99.8% of pixel alignments of object or 
mask contours. Special case chain code modifications accommodate the remaining 
20 less than 0.2% of pixel alignment as described below in greater detail. 

Function block 816 indicates that a contour is obtained for a mask, object, or 
sprite. The contoxor may be obtained, for example, by object segmentation process 
140 described with reference to FIGS. 4 and 5. 

Function block 8 1 8 indicates that an initial pixel in the contour is identified. 
25 The initial pixel may be identified by conrunon methods such as, for example, a pixel 
with minimal X-axis and Y-axis coordinate positions. 

Function block 820 indicates that a predetermined chain code is assigned to 
represent the relationship between the initial pixel and the next adjacent pixel in the 
contour. Preferably, the predetermined chain code is defined to correspond to the 
30 forward direction. 

FIG. 25B is a diagrammatic representation of a three-point chain code 822 of 
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the present invention. Chain code 822 includes three chain codes 824a, 824b, and 
824c that correspond to a forward direction 826a, a leftward direction 826b, and a 
rightward direction 826c, respectfully. Directions 826a-826c are defined relative to 
a preceding alignment direction 828 between a current pixel 830 and an adjacent 
5 pixel 832 representing the preceding pixel in the chain code. 

Preceding ahgnment direction 828 may extend in any of the directions 802 
shown in Fig. 24A. but is shown with a specific orientation (i.e., right, horizontal) 
for purposes of illustration. Direction 826a is defined, therefore, as the same as 
direction 828. Directions 826b and 826c differ from direction 828 by leftward and 
0 rightward displacements of one pixel. 

It has been determined experimentally that slightly more than 50% of chain 
codes 824 correspond to forward direction 826a. and slightly less than 25% of chain 
codes 824 correspond to each of directions 826b and 826c. 

Function block 836 represents an inquiry as to whether the next adjacent 
5 pixel in the contour conforms to one of directions 826. Whenever the next adjacent 
pixel in the contour conforms to one of directions 826, function block 836 proceeds 
to function block 838, and otherwise proceeds to function block 840. 

Function block 838 indicates that the next adjacent pixel is assigned a chain 
code 824 corresponding to its direction 826 relative to the direction 828 along which 
0 the adjacent preceding pair of pixels are aligned. 

Function block 840 indicates that a pixel sequence conforming to one of 
directions 826 is substituted for the actual nonconformal pixel sequence. Based 
upon experimentation, it has been determined that such substitutions typically will 
arise in fewer than 0.2% of pixel sequences in a contour and may be accommodated 
5 by one of six special-case modifications. 

FIG. 25C is a diagrammatic representation of the six special-case 
modifications 842 for converting non-conformal pixel sequences to pixel sequences 
tiiat conform to directions 826. Witiiin each modification 842, a pixel sequence 844 
is converted to a pixel sequence 846. In each of pixel sequences 844 of adjacent 
respective pixels X'. X-. A. B. the direction between pixels A and B does not 
conform to one of directions 826 due to the alignment of pixel A relative to the 
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alignment of pixels X' and X". 

In pixel sequence 844a, initial pixel alignments 850a and 852a represent a 
nonconformal right-angle direction change. Accordingly, in pixel sequence 846a, 
pixel A of pixel sequence 844a is omitted, resulting in a pixel direction 854a that 
5 conforms to pixel direction 826a. Pixel sequence modifications 842b-842f similarly 
convert nonconformal pixel sequences 844b-844f to conformal sequences 846b-846f, 
respectively. 

Pixel sequence modifications 842 omit pixels that cause pixel direction 
alignments that change by 90'' or more relative to the alignments of adjacent 
10 preceding pixels XI and X2. One effect is to increase the minimum radius of 

curvature of a contour representing a right angle to three pixels. Pixel modifications 
842 cause, therefore, a minor loss of extremely fine contour detail. According to 
this invention, however, it has been determined that the loss of such details is 
acceptable under most viewing conditions. 

1 5 Function block 860 represents an inquiry as to whether there is another pixel 

in the contour to be assigned a chain code. Whenever there is another pixel in the 
contour to be assigned a chain code, function block returns to function block 836, 
and otherwise proceeds to fxinction block 862. 

Function block 862 indicates that nonconformal pixel alignment directions 

20 introduced or incurred by the process of function block 840 are removed. In a 

preferred embodiment, the nonconformal direction changes may be omined simply 
by returning to function block 816 and repeating process 810 until no nonconformed 
pixel sequences remain, which typically is achieved in fewer than 8 iterations. In an 
alternative embodiment, such incurred nonconformal direction changes may be 

25 corrected in "real-time" by checking for and correcting any incurred nonconformal 
direction changes each time a nonconformal direction change is modified. 

Function block 864 indicates that a Huffman code is generated from the 
resulting simplified chain code. With chain codes 824a-824c corresponding to 
directions 826A-826C that occur for about 50%, 25% and 25% of pixels in a 

30 contour, respective Huffman codes of 0, 11, and 10 are assigned. Such first order 
Huffman codes allow chain process 810 to represent contours at a bit rate of less 
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than 1.5 bits per pixel in the contour. Such a bitrate represents approximately a 
50% compression ratio improvement over conventional chain code processes. 

It will be appreciated that higher order Huffman coding can provide higher 
compression ratios. Higher order Huffman coding includes, for example, assigning 
predetermined values to preselected sequences of first order Huffman codes. 
SPRITE GENERATION 

The present invention includes generating sprites for use in connection with 
encoding determinate motion video (movie). Bitmaps are accreted into bitmap series 
that comprise a plurality of sequential bitmaps of sequential images from an image 
source. Accretion is used to overcome the problem of occluded pixels where objects 
or figures move relative to one another or where one figure occludes another similar 
to the way a foreground figure occludes the background. For example, when a 
foreground figure moves and reveals some new background, there is no way to build 
that new background from a previous bitmap unless the previous bitmap was first 
1 5 enhanced by including in it the pixels that were going to be uncovered in the 

subsequent bitmap. This method takes an incomplete image of a figure and looks 
forward in time to find any pixels that belong to the image but are not to be 
immediately visible. Those pixels are used to create a composite bitmap for the 
figure. With the composite bitmap, any future view of the figure can be created by 
20 distorting the composite bitmap. 

The encoding process begins by an operator identifying the figures and the 
parts of the figures of a current bitmap from a current bitmap series. Feature or 
distortion points are selected by the operator on the features of the parts about which 
the parts of the figures move. A current grid of triangles is superimposed onto the 
25 parts of the current bitmap. The triangles that constitute the current grid of triangles 
are formed by connecting adjacent distonion points. The distortion points are the 
vertices of the triangles. The current location of each triangle on the current bitmap 
is determined and stored to the storage device. A portion of data of the current 
bitmap that defines the first image within the current location of each triangle is 
30 retained for further use. 

A succeeding bitmap that defines a second image of the current bitmap series 
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is received from the image source, and the figures and the parts of the figure are 
identified by the operator. Next, the current grid of triangles from the current 
bitmap is superimposed onto the succeeding bitmap. The distonion points of current 
grid of triangles are realigned to coincide with the features of the corresponding 
5 figures on the succeeding bitmap. The realigned distortion points form a succeeding 
grid of triangles on the succeeding bitmap of the second image. The succeeding 
location of each triangle on the succeeding bitmap is determined and stored to the 
storage device. A portion of data of the succeeding bitmap that defines the second 
image within the succeeding location of each triangle is retained for further use. 
1 0 The process of determining and storing the current and succeeding locations 

of each triangle is repeated for the plurality of sequential bitmaps of the current 
bitmap series. When that process is completed, an average image of each triangle in 
the current bitmap series is determined from the separately retained data. The 
average image of each triangle is stored to the storage device. 

1 5 During playback, the average image of each triangle of the current bitmap 

series and the current location of each triangle of the current bitmap are retrieved 
from the storage device. A predicted bitmap is generated by calculating a 
transformation solution for transforming the average image of each triangle in the 
current bitmap series to the current location of each triangle of the current bitmap 

20 and applying the transformation solution to the average image of each triangle. The 
predicted bitmap is passed to the monitor for display. 

In connection with a playback determinate motion video (video game) in 
which the images are determined by a controlling program at playback, a sprite 
bitmap is stored in its entirety on a storage device. The sprite bitmap comprises a 

25 plurality of data bits that define a sprite image. The sprite bitmap is displayed on a 
monitor, and the parts of the sprite are identified by an operator and distortion points 
are selected for the sprite's parts. 

A grid of triangles is superimposed onto the parts of the sprite bitmap. The 
triangles that constitute the grid of triangles are formed by connecting adjacent 

30 distortion points. The distortion points are the vertices of the triangles. The 

location of each triangle of the sprite bitmap is determined and stored to the storage 



BNSDOCID: <WO 9713372A2J_> 



wo 97/13372 



PCT/US96/15892 



-52- 

device. 

During playback, a succeeding location of each triangle is received from a 
controlling program. The sprite bitmap and the succeeding location of each triangle 
on the sprite bitmap are recalled from the storage device and passed to the display 
5 processor. The succeeding location of each triangle is also passed to the display 
processor. 

A transformation solution is calculated for each triangle on the sprite bitmap. 
A succeeding bitmap is then generated in the display processor by applying the 
transformation solution of each triangle derived from the sprite bitmap the defines 

10 the sprite image within the location of each triangle. The display processor passes 
the succeeding sprite biunap to a monitor for display. This process is repeated for 
each succeeding location of each triangle requested by the controlling program. 

As shown in Fig. 26, an encoding procedure for a movie motion video begins 
at step 900 by the CPU 22 receiving from an image source a current bitmap series. 

15 The current bitmap series comprises a plurality of sequential bitmaps of sequential 
images. The current bitmap series has a current bitmap that comprises a plurality of 
data bits which define a first image from the image source. The first image 
comprises at least one figure having at least one part. 

Proceeding to step 902, the first image is displayed to the operator on the 

20 monitor 28. From the monitor 28, the figures of the first image on the current 

bitmap are identified by the operator. The parts of the figure on the current bitmap 
are then identified by the operator at step 904. 

Next, at step 906, the operator selects feature or distortion points on the 
current bitmap. The distortion points are selected so that the distortion points 

25 coincide with features on the bitmap where relative movement of a part is likely to 
occur. It will be understood by those skilled in the art that the figures, the parts of 
the figures and the distortion points on a bitmap may be identified by the computer 
system 20 or by assistance from it. It is preferred, however, that the operator 
identify the figures, the parts of the figures and the distortion points on a bitmap. 

30 Proceeding to step 908, a current grid of triangles is superimposed onto the 

parts of the current bitmap by the computer system 20. With reference to Fig. 27A, 
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the current grid comprises triangles formed by connecting adjacent distortion points. 
The distortion points form the vertices of the triangles. More specifically, the first 
image of the current bit map comprises a figure, which is a person 970. The person 
970 has six parts corresponding to a head 972, a torso 974, a right arm 976, a left 
arm 978, right leg 980, and a left leg 982. Distortion points are selected on each 
part of the person 970 so that the distortion points coincide with features where 
relative movement of a part is likely to occur. A current grid is superimposed over 
each part with the triangles of each current grid formed by connecting adjacent 
distortion points. Thus, the distortion points form the vertices of the triangles. 

At step 910, the computer system 20 determines a current location of each 
triangle on the current bitmap. The current location of each triangle on the current 
bitmap is defined by the location of the distortion points that form the vertices of the 
triangle. At step 912, the current location of each triangle is stored to the storage 
device. A portion of data derived from the current bitmap that defines the first 
image within the current location of each triangle is retained at step 914, 

Next, at step 916, a succeeding bitmap of the current bitmap series is 
received by the CPU 22. The succeeding bitmap comprises a plurality of data bits 
which define a second image of the current bitmap series. The second image may or 
may not include figures that correspond to the figures in the first image. For the 
following steps, the second image is assumed to have figures that corresponds to the 
figures in the first image. At step 918, the current grid of triangles is superimposed 
onto the succeeding bitmap. The second image with the superimposed triangular 
grid is displayed to the operator on the monitor 28. 

At step 920, the distortion points are realigned to coincide with 
corresponding features on the succeeding bitmap by the operator with assistance 
from the computer system 20. The computer system 20 realigns the distortion using 
block matching. Any mistakes are corrected by the operator. With reference to Fig. 
27B, the realigned distortion points form a succeeding grid of triangles. The 
realigned distonion points are the vertices of the triangles. More specifically, the 
second image of the succeeding bitmap of person 200 includes head 972, torso 974. 
right arm 976, left arm 978, right leg 980, and left leg 982. In the second image. 
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however, the right arm 980 is raised. The current grids of the first image have been 
superimposed over each pan and their distortion points realigned to coincide with 
corresponding features on the second image. The realigned distortion points define 
succeeding grids of triangles. The succeeding grids comprise triangles formed by 
connecting the realigned distortion points. Thus, the reaUgned distortion point form 
the vertices of the u-iangles of the succeeding grids. 

Proceeding to step 922, a succeeding location of each triangle of the 
succeeding bitmap is determined by the computer system 20. At step 924, the 
succeeding location of each triangle on the succeeding bitmap is stored the storage 
device. A portion of data derived from the succeeding bitmap that defines the 
second image within the succeeding location of each mangle is retained at step 926. 
Step 926 leads to decisional step 928 where it is determined if a next succeeding 
bitmap exists. 

If a next succeeding bitmap exists, the YES branch of decisional step 928 
15 leads to step 930 where the succeeding bitmap becomes the current bitmap. Step 
930 returns to step 916 where a succeeding bitmap of the current bitmap series is 
received by the CPU 22. If a next succeeding bitmap does not exist, the NO branch 
of decisional step 928 leads to step 932 where an average image for each triangle of 
the current bitmap series is determined. The average image is the median value of 
20 the pixels of a triangle. Use of the average image makes the process less susceptible 
to degeneration. Proceeding to step 934, the average image of each triangle of the 
current bitmap series is stored to the storage device. 

Next, at step 936, the current location of each u-iangle on the current bitmap 
is retrieved from the storage device. An affine transformation solution for 
transforming the average image of each triangle to the current location of the 
triangle on the current bitmap is then calculated by the computer system 20 at step 
938. At step 940. a predicted bitmap is generated by applying the ttansformation 
solution of the average image of each ttiangle to the current location of each triangle 
on the current bitmap. The predicted bitmap is compared with the current bitmap at 
30 step 942. 

At step 944 a correction bitmap is generated. The corrected bitmap 



25 
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comprises the data bits of the current bitmap that were not accurately predicted by 
the predicted bitmap. The corrected bitmap is stored to the storage device at step 
948. Step 948 leads to decisional step 950 where it is determined if a succeeding 
bitmap exists. 

5 If a succeeding bitmap exists, the YES branch of decisional step 950 leads to 

step 952 where the succeeding bitmap becomes the current bitmap. Step 952 returns 
to step 936 where the current location of each triangle on the current bitmap is 
retrieved from the storage device. If a next succeeding bitmap does not exist, the 
NO branch of decisional step 950 leads to decisional step 954 where it is determined 
10 if a succeeding bitmap series exists. If a succeeding bitmap series does not exist, 
encoding is finished and the NO branch of decisional step 954 leads to step 956. If 
a succeeding bitmap series exists, the YES branch of decisional step 954 leads to 
step 958 where the CPU 22 receives the succeeding bitmap series as the current 
bitmap series. Step 956 returns to step 902 where the figures of the first image of 
15 the current bitmap series is identified by the operator. 

The process of Fig. 26 describes generation of a sprite or master object 90 
for use by encoder process 64 of Fig. 3. The process of utilizing master object 90 
to form predicted objects 102 is described with reference to Fig. 28. 

As shown in Fig. 28, the procedure begins at step 1000 with a current bitmap 
20 series being retrieved. The current bitmap series comprises a plurality of sequential 
bitmaps of sequential images. The current bitmap series has a current bitmap that 
comprises a plurality of data bits which define a first image from the image source. 
The first image comprises at least one figure having at least one part. 

At step 1002, the average image of each triangle of the current bitmap series 
25 is retrieved from the storage device. The average image of each triangle is then 
passed to a display processor (not shown) at step 704. It will be appreciated that 
computer system 20 (Fig. 1) can optionally include a display processor or other 
dedicated components for executing for processes of this invention. Proceeding to 
step 1006, the current location of each triangle on the current bitmap is retrieved 
30 from the storage device. The current location of each triangle is passed to the 
display processor at step 1008. 
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Next, an affme transformation solution for transforming the average image of 
each triangle to the current location of each triangle on the current bitmap is 
calculated by the display processor at step 1010. Proceeding to step 1012, a 
predicted bitmap is generated by the display processor by applying the 
5 transformation solution for transforming the average image of each triangle to the 
current location of each triangle on the current bitmap. 

At step 1014, a correction bitmap for the current bitmap is retrieved from the 
storage device. The correction bitmap is passed to the display processor at step 716. 
A display bitmap is then generated in the display processor by overlaying the 
10 predicted bitmap with the correction bitmap. The display processor retains a copy of 
the average image of each u-iangle and passes the display bitmap to the frame buffer 
for display on the monitor. 

Next, at decisional step 1020, it is determined if a succeeding bitmap of the 
current bitmap series exists. If a succeeding bitmap of the current bitmap series 
15 exists, the YES branch of decisional step 1020 leads to step 1022. At step 1022, the 
succeeding bitmap becomes the current bitmap. Step 1 022 returns to step 1 006 
where the location of each triangle on the current bitmap is retrieved from the 
storage device. 

Returning to decisional step 1020, if a succeeding bitmap of the current 
20 bitmap series does not exist, the NO branch of decisional step 1020 leads to 

decisional step 1024. At decisional step 1024, it is determined if a succeeding 
bitmap series exists. If a succeeding bitmap series does not exist, then the process is 
finished and the NO branch of decisional step 1024 leads to step 1026. If a 
succeeding bitmap series exists, the YES branch of decisional step 1024 leads to step 
25 1028. At step 1028, the succeeding bitmap series becomes the current bitmap series. 
Step 1028 returns to step 1000. 

Having illustrated and described the principles of the present invention in a 
preferred embodiment, it should be apparent to those skilled in the art that the 
embodiment can be modified in arrangement and detail without departing from such 
30 principles. Accordingly, we claim as our invention all such embodiments as come 
within the scope and spirit of the following claims and equivalents thereto. 
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WE CLAIM: 

1. A method of encoding in a compressed fomiat information within a video 
image frame sequence having first and second video image frames that include an 
arbitrary image feature with an arbitrary configuration, the arbitrary image feature 
having different attributes in the first and second video image frames, the method 
comprising: 

determining a dense motion transformation between the arbitrary image 
feature in the first and second video image frames to determine an estimated 
arbitrary image feature in the second video image frame; and 

identifying a difference between the estimated arbitrary image feature in the 
second video image frame and the arbitrary image feature in the second video image 
frame to determine a transform error for the arbitrary image feature. 

2. The method of claim 1 in which the arbitrary image feature includes as 
attributes a position, an orientation, and a configuration in the each of the first and 
second video image frames and the difference between the attributes of the arbitrary 
image feature in the first and second video image frames includes a difference in at 
least one of the position, orientation, or configuration. 

3. The method of claim 1 further comprising applying the transform error to 
the estimated arbitrary image feature in the second video image frame to form a 
corrected image feature in the second video image frame. 

4. The method of claim 3 in which the video image frame sequence funher 
includes a third video image frame that includes the arbitrary image feature and the 
method further comprises determining a dense motion transformation between the 
corrected image feature in the second video image frame and the arbitrary image 
feature in the third video image frame to determine an estimated arbitrary image 
feature in the third video image frame. 

5. The method of claim 1 in which image features in the video image frame 
sequence are formed from plural pixels and in which determining the dense motion 
transformation between the arbitrary image feature in the first and second video 
image frames includes identifying corresponding pixels of the arbitrary image feature 
in the first and second video image frames. 
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6. The method of claim 5 further comprising identifying ail corresponding 
pixels of the arbitrary image feature in the first and second video image frames. 

7. The method of claim 1 in which determining the dense motion 
transformation between the arbitrary image feature in the first and second video 

5 image frames includes determining affme motion transformations between the 
arbitrary image feanire in the first and second video image frames. 

8. The method of claim 1 in which the first video image frame precedes the 
second video image frame. 

9. The method of claim 1 further comprising encoding the transform error in 
10 a first compressed format, decoding the transform error from the first compressed 

format to form a quantized transform error, and correcting the estimated arbiu-ary 
image feanire in the second video image according to the quantized u-ansform error. 

10. The method of 9 in which the transform error encoded in the first 
compressed format is a lossy representation of the transform error. 

^ The method of claim 9 further comprising encoding in a second 
compressed format the transform error encoded in the first compressed format. 

12. The method of claim 1 1 in which the first and second compressed 
formats are, respectively, lossy and lossless compression formats. 

13. The method of claim I in which the first and second video image frames 
20 further include plural odier arbitrary image feanires with arbitrary configurations, at 

least one of the other arbitrary image feanires having different attributes in the first 
and second video image frames, the method further comprising: 

detenmining dense motion transformations between the other arbitrary image 
features in the first and second video image frames to determine estimated other 
25 arbitrary image features in the second video image frame; and 

identifying differences between the estimated other arbitrary image features in 
the second video image frame and the other arbitrary image feanires in the second 
video image frame to determine transform errors for the other arbitrary image 
features. 

30 14. A data structure stored on a computer-readable medium and representing 

in a compressed format information within a video image frame sequence having 
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plural video image frames that include plural image features that have different 
attributes in the selected ones of the video image frames, the data structure 
comprising: 

selected image feature data corresponding to selected characteristics of the 
5 plural image features; 

affine transform coefficient data corresponding to coefficients of affine 
transformations that represent changes of the plural image features between the 
selected ones of the video image frames; and 

transform error data corresponding to errors in the changes of the plural 
10 image features represented by the affine transformations. 

15. The data structxire of claim 14 in which at least one the affme transform 
coefficient data and the transform error data is encoded in a compressed format. 

16. The data structiu-e of claim 14 in which the selected characteristics of the 
plural image features include binary mask representations of the plural image 

15 features. 

17. The data structure of claim 14 in which the selected characteristics of the 
plural image features include plural selected pixels from each of the plural image 
features. 

18. The data structure of claim 14 in which the selected characteristics of the 
20 plural image features include sprites that each represent the different attributes of 

one of the image features in the plural video image frames. 

19. A method of decoding compressed information relating to an arbitrary' 
image feature with an arbitrary configuration within first and second video image 
frames of a video image frame sequence, the arbitrary image feature having different 

25 attributes in the first and second video image frames, the method comprising: 

applying a dense motion transformation to a representation of the arbitrary 
image feature in the first video image frame to form an estimated arbitrary image 
feature in the second video image frame; and 

applying a transform difference to the estimated arbitrary image feature in the 
30 second video image frame to obtain the arbitrary image feature in the second video 
image frame. 
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20. The method of claim 19 in which the video image frame sequence 
further includes a third video image frame that includes the arbitrary image feature 
and the method further comprises: 

applying a dense motion transformation to the arbitrary image feature in the 
second video image frame to form an estimated arbitrary image feature in the third 
video image frame; and 

applying a transform difference to the estimated arbitrary image feature in the 
third video image frame to obtain the arbitrary image feature in the third video 
image frame. 

21. The method of claim 19 in which image features in the video image 
frame sequence are formed from plural pixels and in which the dense motion 
transformation represents correlations between corresponding pixels of the arbitrary 
image feature in the first and second video image frames. 

22. The method of claim 19 in which the dense motion U-ansformation 
includes an affme motion transformation between the arbitrary image feature in the 
first and second video image frames. 

23. The method of claim 19 in which the first video image frame precedes 
the second video image frame. 

24. The method of claim 19 in which the first and second video image 
20 frames ftirther include plural other arbitrary image features with arbitrary 

configurations, at least one of the other arbitrary image features having different 
attributes in the first and second video image frames, the method further comprising: 

applying dense motion transformations to representations of the other 
arbitrary image features in the first video image frame to form estimated other 
25 arbitrary image features in the second video image frame; and 

applying transform differences to the estimated other arbiu-ary image features 
in the second video image frame to obtain the other arbitrary image features in the 
second video image frame. 

25. A computer-readable medium storing computer-executable programming 
30 for encoding in a compressed format information within a video image frame 

sequence having first and second video image frames that include an arbitrary image 
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feature with an arbitrary configuration, the arbitrary image feature having different 
attributes in the first and second video image frames, the medium comprising: 

programming for determining a dense motion transformation between the 
arbitrary image feanire in the first and second video image frames to determine an 
estimated arbitrary image feature in the second video image frame; and 

programming for identifying a difference between the estimated arbitrary 
image feanire in the second video image frame and the arbitrary image feature in the 
second video image frame to determine a transform error for the arbitrary image 
feature. 

26. A computer-readable medium storing computer-executable programming 
for decoding compressed information relating to an arbitrary image feature with an 
arbiu-ary configuration within first and second video image frames of a video image 
fi-ame sequence, the arbitrary image feature having different attributes in the first and 
second video image frames, the medium comprising: 
15 programming for applying a dense motion transformation to a representation 

of the arbitrary image feature in the first video image frame to form an estimated 
arbitrary image feature in the second video image fi-ame; and 

programming for applying a transform difference to the estimated arbitrary 
image feanire in the second video image frame to obtain the arbitrary image feature 
in the second video image frame. 

27. A method of encoding in a compressed format information within a 
video image frame sequence having first and second video image frames that include 
an image component, the image component having different attributes in the first 
and second video image frames, the method comprising: 
25 detemiining a motion transformation between the image component in the 

first and second video image frames to determine an estimated image component in 
the second video image frame; 

identifying a difference between the estimated image component in the 
second video image frame and the image component in the second video image 
30 frame to determine a transform error for the image component; 

applying the transform error to the estimated image component in the second 
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video image frame to form a corrected image component in the second video image 
frame; and 

encoding the corrected image component in a first compressed format. 

28. A block matching motion estimation method for estimating motion of 
5 corresponding pixels between first and second video image frames, comprising: 

defining a reference pixel block of multiple pixels relative to a first reference 
pixel in the first video image frame and a sample pixel block of multiple sample 
pixels in the second video image frame, the reference pixel block being a non- 
quadrilateral polygonal array of pixels; 
10 determining and storing for the pixels in the sample pixel block correlations 

to the pixels in the reference pixel block; and 

identifying from the correlations a first sample pixel corresponding to the 
first reference pixel. 

29. The method of claim 1 in which the first reference pixel and the first 

15 sample pixel are included in arbitrary first and second image features in the first and 
second video image frames, respectively, the arbitrary first image feature having an 
interior that is bounded by an image feature perimeter and the reference pixel block 
including a pixel block perimeter that conforms to the image feature perimeter. 

30. The method of claim 29 further comprising: 

20 defining relative to the first reference pixel a preliminary quadrilateral 

reference pixel block of plural pixels; 

identifying the pixels of the preliminary quadrilateral pixel block as ro 
whether they are in the interior of the arbitrary first image feature, at least one of 
the pixels in the preliminary quadrilateral pixel block not being in the interior of the 
25 arbitrary first image feature; and 

establishing as the reference pixel block the pixels of the preliminary 
quadrilateral pixel block in the interior of the arbitrary first image feamre. 

3 1 . The method of claim 29 in which the arbitrary first image feature 
includes plural image feature pixels, the method further comprising: 

30 defining a reference pixel block of multiple pixels relative to each image 

feature pixel in the first video image frame and defining a corresponding sample 
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pixel block of multiple sample pixels in the second video image frame, at least one 
of the reference pixel blocks being a non-quadrilateral polygonal array of pixels; 

determining and storing for the pixels in the sample pixel block correlations 
to the image feature pixel relative to which the corresponding reference pixel block 
5 is defined; and 

identifying from the correlations selected sample pixels in correlation with 
the image feature pixels relative to which the corresponding reference pixel blocks 
are defined. 

32. The method of claim 28 in which determining correlations includes 
10 determining siunmed absolute errors between pixels of the sample pixel block and 

the reference pixel block. 

33. The method of claim 32 in which the first sample pixel corresponding to 
the first reference pixel is the sample pixel for which the summed absolute error 
between the sample pixel block and the reference pixel block is minimal. 

15 34. The method of claim 32 in which each of the reference and sample 

pixels is represented by three color component values and the summed absolute 
errors E are determined as: 

m-l n-1 



20 



E = ZI(|r,<K|g,-g,'| + |b,-b,'|), 

i«=0 j*0 



in which r^, gjj, and b^ correspond to the color component values representing the 
reference pixels and r^', gjj', and bjj' correspond to the color component values 
representing the sample pixels. 

35. The method of claim 34 in which the three color component values 
25 correspond to red, green, and blue color components. 

36. The method of claim 28 in which the first video image frame precedes 
the second video image frame. 

37. A block matching motion estimation method for estimating motion of 
pixels between first and second video image frames that include an arbitrary first 

30 image feature of plural image feature pixels, comprising: 

identifying the arbitrary first image feature in the first video image frame; 
defining a reference pixel block of multiple pixels relative to each image 
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feature pixel in the first video image frame and a corresponding sample pixel block 
of multiple sample pixels in the second video image frame; 

identifying for each reference pixel block in the first video image frame the 
pixels of the arbitrary first image feature; and 

identifying from the sample pixels in the sample pixel block first sample 
pixels corresponding to the image feature pixels. 

38. The method of claim 37 in which the arbitrary first image feamre has an 
interior that is bounded by an image feature perimeter, the method further 
comprising: 

defining relative to each image feature pixel a preliminary quadrilateral 
reference pixel block of plural pixels; 

identifying the pixels of the preliminary quadrilateral reference pixel block as 
to whether they are in the interior of the arbitrary first image feature; and 

establishing as the reference pixel block the pixels of the preliminary 
15 quadrilateral reference pixel block in the interior of the arbitrary first image feature. 

39. The method of claim 38 in which at least one of the pixels in the 
preliminary quadrilateral pixel block is not in the interior of the arbiu-ary first image 
feature and the reference pixel block is defined to include a pixel block perimeter 
that conforms to the image feature perimeter. 

40. The method of claim 37 in which the reference pixel block is a non- 
quadrilateral polygonal array of pixels. 

41. A computer-readable medium storing computer-executable programming 
for estimating motion of corresponding pixels between first and second video image 
frames, the medium comprising: 

25 programming for defining a reference pixel block of multiple pixels relative 

to a first reference pixel in the first video image frame and a sample pixel block of 
multiple sample pixels in the second video image frame, the reference pixel block 
being a non-quadrilateral polygonal array of pixels; 

programming for determining and storing for the pixels in the sample pixel 
30 block correlations to the reference pixel block and the first reference pixel; and 
programming for identifying from the correlations a first sample pixel 
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corresponding to the first reference pixel. 

42. The medium of claim 41 in which the first reference pixel and the first 
sample pixel are included in arbitrary first and second image features in the first and 
second video image frames, respectively, the arbitrary first image feature having an 

5 interior that is bounded by an image feature perimeter and the reference pixel block 
including a pixel block perimeter that conforms to the image feature perimeter, the 
medium further comprising: 

programming for defining relative to the first reference pixel a preliminary 
quadrilateral reference pixel block of plural pixels; 
10 programming for identifying the pixels of the preliminary quadrilateral pixel 

block as to whether they are in the interior of the arbitrary first image feature, at 
least one of the pixels in the preliminary quadrilateral pixel block not being in the 
interior of the arbitrary first image feature; and 

programming for establishing as the reference pixel block the pixels of the 
15 preliminary quadrilateral pixel block in the interior of the arbitrary first image 
feature. 

43. The medium of claim 41 in which the first reference pixel and the first 
sample pixel are included in arbitrary first and second image features in the first and 
second video image frames, respectively, the arbitrary first image feature having 

20 plural image feature pixels in an interior that is bounded by an image feature 

perimeter, the reference pixel block including a pixel block perimeter that conforms 
to the image feature perimeter, the medium further comprising: 

progranuning for defining a reference pixel block of multiple pixels relative 
to each image featiore pixel in the first video image frame and defining a 
25 corresponding sample pixel block of multiple sample pixels in the second video 
image frame, at least one of the reference pixel blocks being a non-quadrilateral 
polygonal array of pixels; 

programming for determining and storing for the pixels in the sample pixel 
block a correlation to the image feature pixel relative to which the corresponding 
30 reference pixel block is defined: and 

programming for identifying from the correlations a selected sample pixel in 
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correlation with the image feature pixel relative to which the corresponding reference 
pixel block is defined. 44. A block matching motion estimation 

method for estimating motion of corresponding pixels between first and second 
video image frames, comprising: 

defining first and second reference pixel blocks of multiple pixels relative to 
respective first and second reference pixels in the first video image frame and a 
sample pixel block of multiple sample pixels in the second video image frame, the 
first reference, second reference, and sample pixel blocks including plural respective 
first reference, second reference, and sample subsets of multiple pixels; 

determining and storing first correlations between the sample subsets and the 
first reference subsets; 

determining and storing second correlations between the sample subsets and 
the second reference subsets, wherein at least one of the second correlations matches 
one of the first correlations and determining the at least one of the second 
15 correlations includes retrieving the matching one of the first correlations; and 

identifying from the first and second correlations first and second sample 
pixels corresponding to the first and second reference pixels, respectively. 

45. The method of claim 44 in which die pixels of the first and second 
video image frames are arranged as regular arrays of pixels and the first and second 

20 reference subsets are commonly aligned segments of the regular arrays. 

46. The method of claim 44 in which the pixels of the first and second 
video image frames are arranged as regular arrays of rows and columns of pixels 
and the first and second reference subsets are portions of columns of the regular 
arrays. 

"^'7- The method of claim 44 in which the first and second correlations 
include multiple correlation components and a selected one of the second 
correlations matches a selected first correlation and includes fewer than all the 
correlation components of the selected first correlation, the method further 
comprising: 

30 retrieving the selected first correlation; and 

conforming the selected first correlation to the selected second correlation. 
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48. The method of claim 47 in which the selected second correlation 
includes a new correlation component not included in the selected first correlation 
and conforming the selected first correlation includes incorporating into it the new 
correlation component. 
5 49. The method of claim 47 in which the selected second correlation 

includes a new correlation component not included in the selected first correlation 
and the selected first correlation includes a prior correlation component not included 
in the selected second correlation, wherein conforming the selected first correlation 
includes omitting the prior correlation component from and incorporating the new 
10 correlation component in the selected first correlation. 

50. The method of claim 44 in which at least one of the first and second 
reference pixel blocks is a non-quadrilateral polygonal array of pixels. 

51. The method of claim 44 in which the first reference pixel and the first 
sample pixel are included an arbitrary first image feature in the first and second 

15 video image frames, respectively, the arbitrary first image feature having an interior 
that is bounded by an image feature perimeter and the first reference pixel block 
including a pixel block perimeter that conforms to the image featxire perimeter. 

52. The method of claim 44 in which determining correlations includes 
determining mean absolute errors between the sample and reference subsets of 

20 multiple pixels. 

53. In a block matching motion estimation method for estimating motion of 
corresponding pixels between first and second video image frames, a method of 
determining correlations between plural reference pixel blocks of multiple pixels and 
a sample pixel block of multiple sample pixels in the second video image frame, 

25 each reference pixel block being relative to a reference pixel in the first video image 
frame, comprising: 

defining within the reference and sample pixel blocks respective reference 
and sample subsets of multiple pixels; 

determining and storing correlations between the sample and reference 
30 subsets; and 

identifying from the correlations first sample pixels corresponding to the 
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reference pixels. 

54. The method of claim 53 in which first and second correlations are 
determined between the sample subsets and the reference subsets of preceding 
reference pixel blocks and subsequent reference pixel blocks, respectively. 
5 55. The method of claim 54 in which at least one of the second correlations 

matches one of the first correlations and determining the at least one of the second 
correlations includes retrieving the matching one of the first correlations. 

56. The method of claim 54 in which the first and second correlations 
include multiple correlation components and a selected second correlation includes 

10 fewer than all the correlation components of a selected first correlation that matches 
the selected second correlation, the method further comprising: 
retrieving the selected first correlation; and 

conforming the selected first correlation to the selected second correlation. 

57. The method of clsiim 56 in which the selected second correlation 

15 includes a new correlation component not included in the selected first correlation 
and conforming the selected first correlation includes incorporating into it the new 
correlation component. 

58. The method of claim 56 in which the selected second correlation 
includes a new correlation component not included in the selected first correlation 

20 and the selected first correlation includes a prior correlation component not included 
in the selected second correlation, wherein conforming the selected first correlation 
includes omitting the prior correlation component from and incorporating the new 
correlation component in the selected first correlation. 

59. A computer-readable medium storing computer-executable programming 
25 for estimating motion of corresponding pixels between first and second video image 

ft*£imes, the medium comprising: 

programming for defining vnthin the reference and sample pixel blocks 
respective reference and sample subsets of muhiple pixels; 

programming for determining and storing correlations between the sample 
30 and reference subsets; and 

programming for identifying from the correlations first sample pixels 
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corresponding to the reference pixels. 

60. The medium of claim 59 fiirther comprising programming for 
determining first and second correlations between the sample subsets and the 
reference subsets of preceding reference pixel blocks and subsequent reference pixel 

5 blocks, respectively. 

61. The medium of claim 60 further comprising programming for 
determining that a selected second correlation matches a selected first correlation and 
programming for determining the selected second correlation with retrieval of the 
matching selected first correlation. 

10 62. The medium of claim 60 in which the first and second correlations have 

multiple correlation components and a selected second correlation has fewer than all 
the correlation components of the matching one of the first correlations, the medium 
further comprising: 

programming for retrieving the matching one of the first correlations; and 

15 programming for conforming the matching one of the first correlations to the 

at least one of the second correlations. 

63 . A data structure stored on a computer-readable medium and representing 
an estimation of motion of corresponding pixels between first and second video 
image ft-ames, the first video image frame including first and second reference pixels 

20 relative to which respective first and second reference pixel blocks are defined, and 

the second video image frame including a sample pixel block of multiple sample 

pixels, comprising: 

reference and sample pixel block subset data representing multiple pixel 

subsets of the reference and sample pixel blocks; and 
25 subset correlation data representing correlations between the multiple pixel 

subsets of the reference and sample pixel blocks. 

64, A precompression video transformation method of transforming an 
arbitrary image feature with a feature boundary of arbitreiry configuration to an 
image component of predetermined configuration for encoding in a compressed 

30 format, the arbitrary image feature and feature boundary including plural feature 
pixels with associated pixel values and the image component including the plural 
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feature pixels and plural non-feature pixels, the method comprising: 

defining the image component of predetermined configuration about the 
image feature and identifying the non-feature pixels; 

identifying plural non-feature pixel sets each of plural adjacent non-feamre 
pixels and including at least one non-feature pixel adjacent a feature pixel of the 
feature boundzuy; and 

assigning to the non- feature pixels in each non-feature pixel set a pixel value 
that includes the pixel value of the feature pixel of the feature boundary adjacent the 
at least one of the non-feature pixels in the non-feature pixel set. 

65. The method of claim 64 in which the image component includes non- 
feature pixels other than the ones in the plural non-feature pixel sets, the method 
further comprising: 

identifying as unassigned pixels die non-feature pixels in the image 
component not included in the non-feature pixel sets; 

identifying pairs of non-feature pixel sets adjacent the unassigned pixels; and 
assigning to unassigned pixels pixel values that include the pixel values of 
the adjacent pairs of non-feature pixels sets. 

66. The method of claim 64 in which the feanire and non-feature pixels of 
the image component are arranged as an array of rows and columns of pixels and 
the non-feature pixel sets include a row and a column of non-feature pixels. 

67. The method of claim 64 in which selected pairs of non-feature pixel sets 
include common non-feature pixels and the common non-feature pixels include the 
pixel values assigned to the non-feature pixels of both non-feature pixel sets. 

68. The method of claim 67 in which the common non-feature pixels are 
25 assigned pixel values that include averages of the pixel values assigned to the non- 
feature pixels of both non-feature pixel sets. 

69. The method of claim 64 in which non-feature pixels in at least one of 
the non-feature pixel sets are assigned the pixel value of the adjacent feature pixel of 
the feature boundary. 

70. The method of claim 64 in which non-feature pixels in one non-feature 
pixel set are assigned the pixel value of die feature pixel of the feature boundary 
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adjacent the at least one of the non-feature pixels in the non-feature pixel set. 

71. A precompression video transformation method of transforming an 
arbitrary image feature with a feature boundary of arbitrary configuration to an 
image component of predetermined configuration for encoding in a compressed 
5 format, the arbitrary image feature and feature boundary including plural feature 
pixels with associated pixel values and the image component including the plural 
feature pixels and plural non-feature pixels, the method comprising: 

assigning to each of the non-feature pixels of the image component a pixel 
value that includes the pixel value of a feature pixel of the feature boundary. 
10 72. The method of claim 71 in which selected non-feature pixels in the 

image component are assigned pixel values of plural feature pixels of the feature 
boundary. 

73. The method of claim 72 in which the selected non-feature pixels are 
assigned pixel values that include an average of the pixel values of plural feature 

15 pixels of the feature boundary. 

74. The method of claim 71 in which non-feature pixels in the image 
component are assigned the pixel values of feature pixels of the feature boundary'. 

75. The method of claim 71 in which the feature and non-feature pixels of 
the image component are arranged as an array of rows and columns of pixels, the 

20 method further comprising: 

identifying selected non-feature pixels of the image component that are in 
rows or columns with feature pixels of the feature boundary; and 

assigning to the selected non-feature pixels pixel values that include the pixel 
values of the feature pixels of the feature boundary. 
25 76. A computer-readable medium storing computer-executable programming 

for transforming an arbitrary image feature with a feature boundary of arbitrary 
configuration to an image component of predetermined configuration for encoding in 
a compressed format, the arbitrary image feature and feature boundary including 
plural feature pixels with associated pixel values and the image component including 
30 the plural feature pixels and plural non-feature pixels, the medium comprising: 

programming for assigning to each of the non-feature pixels of the image 
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component a pixel value that includes the pixel value of a feature pixel of the 
feature boundary. 

77. The medium of claim 15 further comprising programming for assigning 
to selected non-feature pixels in the image component pixel values of plural feature 

) pixels of the feature boundary. 

78. A data structure stored on a computer-readable medium and representing 
a precompression extrapolation of an arbitrary image feature with a feature boundary 
of arbitrary configuration to an image component of predetermined configuration, 
the arbiu-ary image feature and feature boundary including plural feanu-e pixels with 
associated pixel values and the image componem including the plural feature pixels 
and plural non-feature pixels, the data structure comprising: 

image feature data representing the arbitrary image feature and the feature 
boundary of arbitrary configuration and including pixel values of pixels in the 
feature boundary; 

image component data representing the image component of predetermined 
configuration about the image feature; and 

non-feature pixel data representing the non-feature pixels with pixel values 
that include value that includes the pixel value of the pixel values of the pixels in 
the feature boundary. 

79. The data structure of claim 78 in which the non-feature pixel data 
includes pixel values that represent non-feature pixels with pixel values that are the 
same as pixel values of pixels in the feature boundary. 

80. The data structure of claim 78 in which the non- feature pixel data 
includes pixel values that represent non-feature pixels with pixel values that are 

25 averages of pixel values of pixels in the feature boundary. 
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The present invention relates to processes for compressing digital video 
signals and, in particular, to an object-based digital video encoding process with 
error feedback to increase accuracy. 

BACKGROUND OF THE INVENTION 

Full-motion video displays based upon analog video signals have long been 
available in the form of television. With recent increases in computer processing 
capabilities and affordability, full-motion video displays based upon digital video 
signals are becoming more widely available. Digital video systems can provide 
significant improvements over conventional analog video systems in creating, 
modifying, transmitting, storing, and playing full-motion video sequences. 

Digital video displays include large numbers of image frames that are played 
or rendered successively at frequencies of between 30 and 75 Hz. Each image frame 
is a still image formed from an array of pixels according to the display resolution of 
a particular system. As examples, VHS-based systems have display resolutions of 
320x480 pixels, NTSC-based systems have display resolutions of 720x486 pixels, 
and high-definition television (HDTV) systems under development have display 
resolutions of 1360x1024 pixels. 

The amounts of raw digital information included in video sequences are 
massive. Storage and transmission of these amounts of video information is 
infeasible with conventional personal computer equipment. With reference to a 
digitized form of a relatively low resolution VHS image format having a 320x480 
pixel resolution, a full-length motion picture of two hours in duration could 
correspond to 100 gigabytes of digital video information. By comparison, 
conventional compact optical disks have capacities of about 0.6 gigabytes, magnetic 
hard disks have capacities of 1-2 gigabytes, and compact optical disks under 
development have capacities of up to 8 gigabytes. 

In response to the limitations in storing or transmitting such massive amounts 
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of digital video information, various video compression standards or processes have 
been established, including MPEG-1, MPEG-2, and H.26X. These conventional 
video compression tecliniques utilize similarities between successive image frames, 
referred to as temporal or interframe correlation, to provide interframe compression 
5 in which pixel-based representations of image frames are converted to motion 

representations. In addition, the conventional video compression techniques utilize 
similarities within image frames, referred to as spatial or intraframe correlation, to 
provide intraframe compression in which the motion representations within an image 
frame are further compressed. Intraframe compression is based upon conventional 
10 processes for compressing still images, such as discrete cosine transform (DCT) 
encoding. 

Although differing in specific implementations, the MPEG-1, MPEG-2, and 
H.26X video compression standards are similar in a number of respects. The 
following description of the MPEG-2 video compression standard is generally 

15 applicable to the others. 

MPEG-2 provides interframe compression and intraframe compression based 
upon square blocks or arrays of pi.xels in video images. A video image is divided 
into transformation blocks having dimensions of 16x16 pixels. For each 
transformation block Tf^ in an image frame N, a search is performed across the 

20 image of an immediately preceding image frame N-1 or also a next successive video 
frame N+1 (i.e., bidirectionally) to identify the most similar respective 
transformation blocks T^^^ or Tj^,.,, 

Ideally, and with reference to a search of the next successive image frame, 
the pixels in transformation blocks and T^^^ are identical, even if the 

25 transformation blocks have different positions in their respective image frames. 
Under these circumstances, the pixel information in transformation block T^g^, is 
redundant to that in transformation block T^^. Compression is achieved by 
substituting the positional translation between transformation blocks T^^ and Tj^^j for 
the pixel information in transformation block T,^^,. In this simplified example, a 

30 single translational vector (AX,AY) is designated for the video information 
associated with the 256 pixels in transformation block Tj^^.,. 
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Frequently, the video information (i.e., pixels) in the corresponding 
transformation blocks and T^^^ are not identical. The difference between them is 
designated a transformation block error E, which often is significant. Although it is 
compressed by a conventional compression process such as discrete cosine transform 
(DCT) encoding, the transformation block error E is cumbersome and limits the 
extent (ratio) and the accuracy by which video signals can be compressed. 

Large transformation block errors E arise in block-based video compression 
methods for several reasons. The block-based motion estimation represents only 
translational motion between successive image frames. The only change between 
corresponding transformation blocks T^ and T^., that can be represented are changes 
in the relative positions of the transformation blocks. A disadvantage of such 
representations is that full-motion video sequences frequently include complex 
motions other than translation, such as rotation, magnification and shear. 
Representing such complex motions with simple translational approximations results 
in the significant errors. 

Another aspect of video displays is that they typically include multiple image 
features or objects that change or move relative to each other. Objects may be 
distinct characters, articles, or scenery within a video display. With respect to a 
scene in a motion picture, for example, each of the characters (i.e., actors) and 
articles (i.e., props) in the scene could be a different object. 

The relative motion between objects in a video sequence is another source of 
significant transformation block errors E in conventional video compression 
processes. Due to the regular configuration and size of the transformation blocks, 
many of them encompass portions of different objects. Relative motion between the 
objects during successive image frames can result in extremely low correlation (i.e., 
high transformation errors E) between corresponding transformation blocks. 
Similarly, the appearance of portions of objects in successive image frames (e.g., 
when a character turns) also introduces high transformation errors E. 

Conventional video compression methods appear to be inherently limited due 
to the size of transformation errors E. With the increased demand for digital video 
display capabilities, improved digital video compression processes are required. 
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SUMMARY OF THE INVENTION 

The present invention includes a video compression encoder process for 
compressing digitized video signals representing display motion in video sequences 
of multiple image frames. The encoder process utilizes object-based video 
compression to improve the accuracy and versatility of encoding interframe motion 
and intraframe image features. Video information is compressed relative to objects 
of arbitrary configurations, rather than fixed, regular arrays of pixels as in 
conventional video compression methods. This reduces the error components and 
thereby improves the compression efficiency and accuracy. As another benefit, 
object-based video compression of this invention provides interactive video editing 
capabilities for processing compressed video information. 

In a preferred embodiment, the process or method of this invention includes 
identifying image features of arbitrary configuration in a first video image frame and 
defining within the image feature multiple distinct feature points. The feature points 
of the image feature in the first video image frame are correlated with corresponding 
feature points of the image feature in a succeeding second video image frame, 
thereby to determine an estimation of the image feature in the second video image 
frame. A difference between the estimated and actual image feature in the second 
video image frame is determined and encoded in a compressed format. 

The encoder process of this invention overcomes the shortcomings of the 
conventional block-based video compression methods. The encoder process 
preferably uses a multi-dimensional transformation method to represent mappings 
between corresponding objects in successive image frames. The multiple dimensions 
of the transformation refer to the number of coordinates in its generalized form. 
The multi-dimensional transformation is capable of representing complex motion that 
includes any or all of translation, rotation, magnification, and shear. As a result, 
complex motion of objects between successive image frames may be represented 
with relatively low transformation error. 

Another source of error in conventional block-based video compression 
methods is motion between objects included within a transformation block. The 
object-based video compression or encoding of this invention substantially eliminates 
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the relative motion between objects within transformation blocks. As a result, 
transformation error arising from inter-object motion also is substantially decreased. 
The low transformation errors arising from the encoder process of this invention 
allow it to provide compression ratios up to 300% greater than those obtainable from 
5 prior encoder processes such as MPEG-2. 

The foregoing and other features and advantages of the preferred embodiment 
of the present invention will be more readily apparent from the following detailed 
description, which proceeds with reference to the accompanying drawings. 

10 BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a block diagram of a computer system that may be used to 
implement a method and apparatus embodying the invention. 

Figs. 2A and 2B are simplified representations of a display screen of a video 
display device showing two successive image frames corresponding to a video 
15 signal. 

Fig. 3A is a generalized funcdonal block diagram of a video compression 

encoder process for compressing digitized video signals representing display motion 

in video sequences of multiple image frames. Fig. 3B is a functional block diagram 

of a master object encoder process according to this invention. 
20 Fig. 4 is a functional block diagram of an object segmentation process for . 

segmenting selected objects from an image frame of a video sequence. 

Fig. 5A is simplified representadon of display screen of the video display 

device of Fig. 2A, and Fig. 5B is an enlarged representation of a portion of the 

display screen of Fig, 5A. 
-5 Fig. 6 is a functional block diagram of a polygon match process for 

determining a motion vector for corresponding pairs of pixels in corresponding 

objects in successive image frames. 

Figs. 7A and 7B are simplified representations of a display screen showing 

two successive image frames with two corresponding objects. 
iO Fig. 8 is a functional block diagram of an alternative pixel block correlation 

process. 
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Fig. 9A is a schematic representation of a first pixel block used for 
identifying corresponding pixels in different image frames. Fig. 9B is a schematic 
representation of an array of pixels corresponding to a search area in a prior image 
frame where corresponding pixels are sought. Figs. 9C-9G are schematic 
5 representations of the first pixel block being scanned across the pixel array of FIG. 
9B to identify corresponding pixels. 

Fig. 1 OA is a schematic representation of a second pixel block used for 
identifying corresponding pixels in different image frames. Figs, lOB-lOF are 
schematic representations of the second pixel block being scanned across the pixel 
10 array of FIG. 9B to identify corresponding pixels. 

Fig. 1 1 A is a schematic representation of a third pixel block used for 
identifying corresponding pixels in different image frames. Figs. 1 lB-1 IF are 
schematic representations of the third pixel block being scanned across the pixel 
array of Fig. 9B. 

15 Fig. 12 is a function block diagram of a multi-dimensional transformation 

method that includes generating a mapping between objects in first and second 
successive image frames and quantitizing the mapping for transmission or storage. 

Fig. 13 is a simplified representation of a display screen showing the image 
frame of Fig. 7B for purposes of illustrating the multi-dimensional transformation 

20 method of Fig. 12. 

Fig. 14 is an enlarged simplified representation showing three selected pixels 
of a transformation block used in the quantization of affine transformation 
coefficients determined by the method of Fig. 12. 

Fig. 15 is a functional block diagram of a transformation block optimization 
25 method utilized in an alternative embodiment of the multi-dimensional 
transformation method of Fig. 12. 

Fig. 16 is a simplified fragmentary representation of a display screen showing 
the image frame of Fig. 7B for purposes of illustrating the transformation block 
optimization method of Fig. 15. 
30 Figs. 17A and 17B are a functional block diagram of a precompression 

extrapolation method for extrapolating image features of arbitrary configuration to a 



BNSDOCID: <WO 9713372A3JA> 



wo 97/13372 



PCT/US96/15892 



predefined configuration to facilitate compression. 

Figs. 18A-18D are representations of a display screen on which a simple 
object is rendered to show various aspects of the extrapolation method of Fig. 14. 

Figs. 19A and 19B are functional block diagrams of an encoder method and 
5 a decoder method, respectively, employing a Laplacian pyramid encoder method in 
accordance with this invention. 

Figs. 20A-20D are simplified representations of the color component values 
of an arbitrary set or array of pixels processed according to the encoder process of 
Fig. 19 A. 

10 Fig. 21 is a functional block diagram of a motion vector encoding process 

according to this invention. 

Fig. 22 is a functional block diagram of an alternative quantized object 
encoder-decoder process. 

Fig. 23A is a generalized functional block diagram of a video compression 
15 decoder process matched to the encoder process of Fig. 3. Fig. 23B is a functional 
diagram of a master object decoder process according to this invention. 

Fig. 24A is a diagrammatic representation of a conventional chain code 
format. Fig. 24B is a simplified representation of an exemplary contour for 
processing with the chain code format of Fig. 24A. 
20 Fig. 25A is a functional block diagram of a chain coding process of this 

invention. 

Fig. 25B is a diagrammatic representation of a chain code format of the 
present invention. 

Fig. 25C is a diagrammatic representation of special case chain code 
25 modifications used in the process of Fig. 25A. 

Fig. 26 is a functional block diagram of a sprite generating or encoding 
process. 

Figs. 27A and 27B are respective first and second objects defined by bitmaps 
and showing grids of triangles superimposed over the objects in accordance with the 
30 process of Fig. 26. 

Fig. 28 is a functional block diagram of a sprite decoding process 
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corresponding to the encoding process of Fig. 26. 

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 
Referring to Fig. I, an operating environment for the preferred embodiment 
5 of the present invention is a computer system 20, either of a general purpose or a 
dedicated type, that comprises at least one high speed central processing unit (CPU) 
22, in conjunction with a memory system 24, an input device 26, and an output 
device 28. These elements are interconnected by a bus structure 30. 

The illustrated CPU 22 is of familiar design and includes an ALU 32 for 
10 performing computations, a collection of registers 34 for temporary storage of data 
and instructions, and a control unit 36 for controlling operation of the system 20. 
CPU 22 may be a processor having any of a variety of architectures including Alpha 
from Digital, MIPS from MIPS Technology, NEC, IDT, Siemens, and others, x86 
from Intel and others, including Cyrix, AMD, and Nexgen, and the PowerPc from 
15 IBM and Motorola. 

The memory system 24 includes main memory 38 and secondary storage 40. 
Illustrated main memory 38 takes the form of 16 megabytes of semiconductor RAM 
memory. Secondary storage 40 takes the form of long term storage, such as ROM, 
optical or magnetic disks, flash memory, or tape. Those skilled in the art will 
20 appreciate that memory system 24 may comprise many other alternative components. 

The input and output devices 26, 28 are also familiar. The input device 26 
can comprise a keyboard, a mouse, a physical transducer (e.g., a microphone), etc. 
The output device 28 can comprise a display, a printer, a transducer (e.g. a speaker), 
etc. Some devices, such as a network interface or a modem, can be used as input 
25 and/or output devices. 

As is familiar to those skilled in the art, the computer system 20 further 
includes an operating system and at least one application program. The operating 
system is the set of software which controls the computer system's operation and the 
allocation of resources. The application program is the set of software that performs 
30 a task desired by the user, making use of computer resources made available through 
the operating system. Both are resident in the illustrated memory system 24. 
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In accordance with the practices of persons skilled in the art of computer 
programming, the present invention is described below with reference to symbolic 
representations of operations that are performed by computer system 20, unless 
indicated otherwise. Such operations are sometimes referred to as being 
5 computer-executed. It will be appreciated that the operations which are symbolically 
represented include the manipulation by CPU 22 of electrical signals representing 
data bits and the maintenance of data bits at memory locations in memory system 
24, as well as other processing of signals. The memory locations where data bits are 
maintained are physical locations that have particular electrical, magnetic, or optical 
10 properties corresponding to the data bits. 

Figs. 2A and 2B are simplified representations of a display screen 50 of a 
video display device 52 (e.g., a television or a computer monitor) showing two 
successive image frames 54a and 54b of a video image sequence represented 
electronically by a corresponding video signal. Video signals may be in any of a 
15 variety of video signal formats including analog television video formats such as 

NTSC, PAL, and SECAM, and pixelated or digitized video signal formats typically 
used in computer displays, such as VGA, CGA, and EGA. Preferably, the video 
signals corresponding to image frames are of a digitized video signal format, either 
as originally generated or by conversion from an analog video signal format, as is 
20 known in the art. 

Image frames 54a and 54b each include a rectangular solid image feature 56 
and a pyramid image feature 58 that are positioned over a background 60. Image 
features 56 and 58 in image frames 54a and 54b have different appearances because 
different parts are obscured and shown. For purposes of the following description, 
25 the particular form of an image feature in an image frame is referred to as an object 
or, alternatively, a mask. Accordingly, rectangular solid image feature 56 is shown 
as rectangular solid objects 56a and 56b in respective image frames 54a and 54b, 
and pyramid image feature 58 is shown as pyramid objects 58a and 58b in respective 
image frames 54a and 54b. 
30 Pyramid image feature 58 is shown with the same position and orientation in 

image frames 54a and 54b and would "appear" to be motionless when shown in the 
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video sequence. Rectangular solid 56 is shown in frames 54a and 54b with a 
different orientation and position relative to pyramid 58 and would "appear" to be 
moving and rotating relative to pyramid 58 when shown in the video sequence. 
These appearances of image features 58 and 60 are figurative and exaggerated. The 
image frames of a video sequence typically are displayed at rates in the range of SO- 
SO Hz. Human perception of video motion typically requires more than two image 
frames. Image frames 54a and 54b provide, therefore, a simplified representation of 
a conventional video sequence for purposes of illustrating the present invention. 
Moreover, it will be appreciated that the present invention is in no way limited to 
such simplified video images, image features, or sequences and, to the contrary, is 
applicable to video images and sequences of arbitrary complexity. 

VIDEO COMPRESSION ENCODER PROCESS OVERVIEW 

Fig. 3A is a generalized functional block diagram of a video compression 
encoder process 64 for compressing digitized video signals representing display 
motion in video sequences of multiple image frames. Compression of video 
information (i.e., video sequences or signals) can provide economical storage and 
transmission of digital video information in applications that include, for example, 
interactive or digital television and multimedia computer applications. For purposes 
of brevity, the reference numerals assigned to function blocks of encoder process 64 
are used interchangeably in reference to the results generated by the function blocks. 

Conventional video compression techniques utilize similarities between 
successive image frames, referred to as temporal or interframe correlation, to provide 
interframe compression in which pixel-based representations of image frames are 
converted to motion representations. In addition, conventional video compression 
techniques utilize similarities within image frames, referred to as spatial or 
intraframe correlation, to provide intraframe compression in which the motion 
representations within an image frame are further compressed. 

In such conventional video compression techniques, including MPEG-1, 
MPEG-2, and H.26X, the temporal and spatial correlations are determined relative to 
simple translations of fixed, regular (e.g., square) arrays of pixels. Video 
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information commonly includes, however, arbitrary video motion that cannot be 
represented accurately by translating square arrays of pixels. As a consequence, 
conventional video compression techniques typically include significant error 
components that limit the compression rate and accuracy. 
5 In contrast, encoder process 64 utilizes object-based video compression to 

improve the accuracy and versatility of encoding interframe motion and intraframe 
image features. Encoder process 64 compresses video information relative to objects 
of arbitrary configurations, rather than fixed, regular arrays of pixels. This reduces 
the error components and thereby improves the compression efficiency and accuracy. 

10 As another benefit, object-based video compression provides interactive video editing 
capabilities for processing compressed video information. 

Referring to Fig. 3A, function block 66 indicates that user-defined objects 
within image frames of a video sequence are segmented from other objects within 
the image frames. The objects may be of arbitrary configuration and preferably 

15 represent distinct image features in a display image. Segmentation includes 

identifying the pixels in the image frames corresponding to the objects. The user- 
defined objects are defined in each of the image frames in the video sequence. In 
Figs. 2A and 2B, for example, rectangular solid objects 56a and 56b and pyramid 
objects 58a and 58b are separately segmented. 

20 The segmented objects are represented by binary or multi-bit (e.g., 8-bit) 

"alphachannel" masks of the objects. The object masks indicate the size, 
configuration, and position of an object on a pixel-by-pixel basis. For purposes of 
simplicity, the following description is directed to binary masks in which each pixel 
of the object is represented by a single binary bit rather than the typical 24-bits (i.e., 

25 8 bits for each of three color component values). Multi-bit (e.g., 8-bit) masks also 
have been used. 

Function block 68 indicates that "feature points" of each object are defined 
by a user. Feature points preferably are distinctive features or aspects of the object. 
For example, corners 70a-70c and corners 72a-72c could be defined by a user as 
30 feature points of rectangular solid 56 and pyramid 58, respectively. The pixels 
corresponding to each object mask and its feature points in each image frame are 
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stored in an object database included in memory system 24, 

Function block 74 indicates that changes in the positions of feature points in 
successive image frames are identified and trajectories determined for the feature 
points between successive image frames. The trajectories represent the direction and 
extent of movement of the feature points. Function block 76 indicates that 
trajectories of the feature points in the object between prior frame N-1 and current 
frame N also is retrieved from the object data base. 

Function block 78 indicates that a sparse motion transformation is determined 
for the object between prior frame N-1 and current frame N. The sparse motion 
transformation is based upon the feature point trajectories between frames N-1 and 
N. The sparse motion transformation provides an approximation of the change of 
the object between prior frame N-1 and current frame N, 

Function block 80 indicates that a mask of an object in a current frame N is 
retrieved from the object data base in memory system 24. 

Function block 90 indicates that a quantized master object or "sprite" is 
formed from the objects or masks 66 corresponding to an image feature in an image 
frame sequence and feature point trajectories 74. The master object preferably 
includes all of the aspects or features of an object as it is represented in multiple 
frames. With reference to Figs. 2A and 2B, for example, rectangular solid 56 in 
frame 54b includes a side 78b not shown in frame 54a. Similarly, rectangular solid 
56 includes a side 78a in frame 54a not shown in frame 54b. The master object for 
rectangular solid 56 includes both sides 78a and 78b. 

Sparse motion transformation 78 frequently will not provide a complete 
representation of the change in the object between frames N-1 and N. For example, 
an object in a prior frame N-1, such as rectangular object 54a, might not include all 
the features of the object in the current frame N, such as side 78b of rectangular 
object 54b. 

To improve the accuracy of the transformation, therefore, an intersection of 
the masks of the object in prior frame N-1 and current frame N is determined, such 
as by a logical AND function as is known in the art. The mask of the object in the 
current frame N is subtracted from the resulting intersection to identify any portions 
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or features of the object in the current frame N not included in the object in the 
prior frame N-1 (e.g., side 78b of rectangular object 54b, as described above). The 
newly identified portions of the object are incorporated into master object 90 so that 
it includes a complete representation of the object in frames N-1 and N. 
5 Function block 96 indicates that a quantized form of an object 98 in a prior 

frame N-1 (e.g., rectangular solid object 56a in image frame 54a) is transformed by 
a dense motion transformation to provide a predicted form of the object 102 in a 
current frame N (e.g., rectangular solid object 56b in image frame 54b). This 
transformation provides object-based interframe compression. 
10 The dense motion transformation preferably includes determining an affine 

transformation between quantized prior object 98 in frame N-1 and the object in the 
current frame N and applying the affine transformation to quantized prior object 98. 
The preferred affine transformation is represented by affine transformation 
coefficients 104 and is capable of describing translation, rotation, magnification, and 
1 5 shear. The affine transformation is determined from a dense motion estimation, 
preferably including a pixel-by-pixel mapping, between prior quantized object 98 
and the object in the current frame N. 

Predicted current object 102 is represented by quantized prior object 98, as 
modified by dense motion transformation 96, and is capable of representing 
20 relatively complex motion, together with any new image aspects obtained from 

master object 90. Such object-based representations are relatively accurate because 
the perceptual and spatial continuity associated with objects eliminates errors arising 
from the typically changing relationships between different objects in different image 
frames. Moreover, the object-based representations allow a user to represent 
25 different objects with different levels of resolution to optimize the relative efficiency 
and accuracy for representing objects of varying complexity. 

Function block 106 indicates that for image frame N, predicted current object 
102 is subtracted from original object 108 for current frame N to determine an 
estimated error 1 10 in predicted object 102. Estimated error 1 10 is a compressed 
30 representation of current object 108 in image frame N relative to quantized prior 
object 98. More specifically, current object 108 may be decoded or reconstructed 
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from estimated error 110 and quantized prior object 98. 

Function block 1 12 indicates that estimated error 110 is compressed or 
"coded" by a conventional "lossy" still image compression method such as lattice 
subband or other wavelet compression or encoding as described in Multirate Systems 
5 and Filter Banks by Vaidyanathan, PTR Prentice-HalU Inc., Englewood Cliffs, New 
Jersey, (1993) or discrete cosine transform (DCT) encoding as described in JPEG: 
Still Imag e Data Compression Standard by Pennebaker et al., Van Nostrand 
Reinhold, New York (1993), 

As is known in the art, "lossy" compression methods introduce some data 
10 distortion to provide increased data compression. The data distortion refers to 

variations between the original data before compression and the data resulting after 
compression and decompression. For purposes of illustration below, the 
compression or encoding of function block 102 is presumed to be wavelet encoding. 
. Function block 1 14 indicates that the wavelet encoded estimated error from 
15 function block 112 is further compressed or "coded" by a conventional "lossless" 
still image compression method to form compressed data 116. A preferred 
conventional "lossless" still image compression method is entropy encoding as 
described in JPEG: Still Image Data Compression Standard by Pennebaker et al. As 
is known in the art, "lossless" compression methods introduce no data distortion. 
20 An error feedback loop 1 18 utilizes the wavelet encoded estimated error from 

function block 112 for the object in frame N to obtain a prior quantized object for 
succeeding frame Nh-1. As an initial step in feedback loop 118, function block 120 
indicates that the wavelet encoded estimated error from function block 112 is inverse 
wavelet coded, or wavelet decoded, to form a quantized error 122 for the object in 
25 image frame N. 

The effect of successively encoding and decoding estimated error 110 by a 
lossy still image compression method is to omit from quantized error 122 video 
information that is generally imperceptible by viewers. This information typically is 
of higher frequencies. As a result, omitting such higher frequency components 
30 typically can provide image compression of up to about 200% with only minimal 
degradation of image quality. 
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Function block 124 indicates that quantized error 122 and predicted object 
102, both for image frame N, are added together to form a quantized object 126 for 
image frame N. After a timing coordination delay 128, quantized object 126 
becomes quantized prior object 98 and is used as the basis for processing the 
5 corresponding object in image frame N+L 

Encoder process 64 utilizes the temporal correlation of corresponding objects 
in successive image frames to obtain improved interframe compression, and also 
utilizes the spatial correlation within objects to obtain accurate and efficient 
intraframe compression. For the interframe compression, motion estimation and 

10 compensation are performed so that an object defined in one frame can be estimated 
in a successive frame. The motion-based estimation of the object in the successive 
frame requires significantly less information than a conventional block-based 
representation of the object. For the intraframe compression, an estimated error 
signal for each object is compressed to utilize the spatial correlation of the object 

15 within a frame and to allow different objects to be represented at different 

resolutions. Feedback loop 118 allows objects in subsequent frames to be predicted 
from fully decompressed objects, thereby preventing accumulation of estimation 
error. 

Encoder process 64 provides as an output a compressed or encoded 
20 representation of a digitized video signal representing display motion in video 
sequences of multiple image frames. The compressed or encoded representation 
includes object masks 66, feature points 68, affine transform coefficients 104, and 
compressed error data 1 16. The encoded representation may be stored or 
transmitted, according to the particular application in which the video information is 
25 used. 

Fig. 3B is a functional block diagram of a master object encoder process 130 
for encoding or compressing master object 90. Function block 132 indicates that 
master object 90 is compressed or coded by a conventional "lossy" still image 
compression method such as lattice subband or other wavelet compression or discrete 
30 cosine transform (DCT) encoding. Preferably, function block 132 employs wavelet 
encoding. 
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Function block 134 indicates that the wavelet encoded master object from 
function block 132 is further compressed or coded by a conventional "lossless" still 
image compression method to form compressed master object data 136. A preferred 
conventional lossless still image compression method is entropy encoding. 

Encoder process 130 provides as an output compressed master object 136. 
Together with the compressed or encoded representations provided by encoder 
process 64, compressed master object 136 may be decompreissed or decoded after 
storage or transmission to obtain a video sequence of multiple image frames. 

Encoder process 64 is described with reference to encoding video information 
corresponding to a single object within an image frame. As shown in Figs. 2A and 
2B and indicated above, encoder process 64 is performed separately for each of the 
objects (e.g., objects 56 and 58 of Figs. 2A and 2B) in an image frame. Moreover, 
many video images include a background over which arbitrary numbers of image 
features or objects are rendered. Preferably, the background is processed as an 
object according to this invention after all user-designated objects are processed. 

Processing of the objects in an image frame requires that the objects be 
separately identified. Preferably, encoder process 64 is applied to the objects of an 
image frame beginning with the forward-most object or objects and proceeding 
successively to the back-most object (e.g., the background). The compositing of the 
encoded objects into a video image preferably proceeds from the rear-most object 
(e.g., the background) and proceeds successively to the forward-most object (e.g., 
rectangular solid 56 in Figs. 2A and 2B). The layering of encoding objects may be 
communicated as distinct layering data associated with the objects of an image frame 
or, alternatively, by transmitting or obtaining the encoded objects in a sequence 
corresponding to the layering or compositing sequence. 

OBJECT SEGMENTATION AND TRACKING 

In a preferred embodiment, the segmentation of objects within image frames 
referred to in function block 66 allows interactive segmentation by users. The object 
segmentation of this invention provides improved accuracy in segmenting objects 
and is relatively fast and provides users with optimal flexibility in defining objects to 
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be segmented. 

Fig. 4 is a functional block diagram of an object segmentation process 140 
for segmenting selected objects from an image frame of a video sequence. Object 
segmentation according to process 140 provides a perceptual grouping of objects that 
is accurate and quick and easy for users to define. 

Fig. 5A is simplified representation of display screen 50 of video display 
device 52 showing image frame 54a and the segmentation of rectangular solid object 
56a. In its rendering on display screen 50, rectangular solid object 56a includes an 
object perimeter 142 (shown spaced apart from object 56a for clarity) that bounds an 
object interior 144, Object interior 144 refers to the outline of object 56a on display 
screen 50 and in general may correspond to an inner surface or, as shown, an outer 
surface of the image feature. Fig. 5B is an enlarged representation of a portion of 
display screen 50 showing the semi-automatic segmentation of rectangular solid 
object 56a. The following description is made with specific reference to rectangular 
sohd object 56a, but is similarly applicable to each object to be segmented from an 
image frame. 

Function block 146 indicates that a user forms within object interior 144 an 
interior outline 148 of object perimeter 142. The user preferably forms interior 
outline 148 with a conventional pointer or cursor control device, such as a mouse or 
trackball. Interior outline 148 is formed within a nominal distance 150 from object 
perimeter 142. Nominal distance 150 is selected by a user to be sufficiently large 
that the user can form interior outline 148 relatively quickly within nominal distance 
150 of perimeter 142. Nominal distance 150 corresponds, for example, to between 
about 4 and 10 pixels. 

Function block 146 is performed in connection with a key frame of a video 
sequence. With reference to a scene in a conventional motion picture, for example, 
the key frame could be the first frame of the multiple frames in a scene. The 
participation of the user in this function renders object segmentation process 140 
semi-automatic, but significantly increases the accuracy and flexibility with which 
objects are segmented. Other than for the key frame, objects in subsequent image 
frames are segmented automatically as described below in greater detail. 
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Function block 152 indicates that interior outline 148 is expanded 
automatically to form an exterior outline 156. The formation of exterior outline 156 
is performed as a relatively simple image magnification of outline 148 so that 
exterior outline 156 is a user-defined number of pixels from interior outline 148. 
5 Preferably, the distance between interior outline 148 and exterior outline 156 is 
approximately twice distance 150. 

Function block 158 indicates that pixels between interior outline 148 and 
exterior outline 156 are classified according to predefined attributes as to whether 
they are within object interior 144, thereby to identify automatically object perimeter 

10 142 and a corresponding mask 80 of the type described with reference to Fig. 3 A. 
Preferably, the image attributes include pixel color and position, but either attribute 
could be used alone or with other attributes. 

In the preferred embodiment, each of the pixels in interior outline 148 and 
exterior outline 156 defines a "cluster center" represented as a five-dimensional 

15 vector in the form of (r, g, b, x, y). The terms r, g, and b correspond to the 

respective red, green, and blue color components associated with each of the pixels, 
and the terms x and y correspond to the pixel locations. The m-number of cluster 
center vectors corresponding to pixels in interior outline 148 are denoted as (I^, I,, . 
. ., I,n.i}, and the n-number of cluster center vectors corresponding pixels in exterior 

20 .outline 156 are denoted as {Oo, O,, . . On.,}. 

Pixels between the cluster center vectors I; and Oj are classified by 
identifying the vector to which each pixel is closest in the five-dimensional vector 
space. For each pixel, the absolute distance dj and d^ to each of respective cluster 
center vectors Ij and Oj is computed according to the following equations: 

25 di=w,„ J I r-r^ | + | g-& | + | b-b; | )+w,,„,,( | x-x^ | + | y-y^ | ), 0<i<m, 
d-r'^coioX I r-tj I + I g-gj I + I b-bj I )+w,,,,,( I x-x^ I + I y-yj | ), 0<j<n, 
in which w^^^,^^ and w^^ord weighting factors for the respective color and pixel 
position information. Weighting factors w^^,^^ and w^,^^^^ are of values having a sum 
of 1 and otherwise selectable by a user. Preferably, weighting factors w^^j^^ and 

30 w^oord ^re of an equal value of 0.5. Each pixel is associated with object interior 144 
or exterior according to the minimum five-dimensional distance to one of the cluster 
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center vectors li and Oj. 

Function block 162 indicates that a user selects at least two, and preferable 
more (e.g. 4 to 6), feature points in each object of an initial or key frame. 
Preferably, the feature points are relatively distinctive aspects of the object. With 
reference to rectangular solid image feature 56, for example, corners 70a-70c could 
be selected as feature points. 

Function block 164 indicates that a block 166 of multiple pixels centered 
about each selected feature point (e.g., corners 70a-70c) is defined and matched to a 
corresponding block in a subsequent image frame (e.g., the next successive image 
frame). Pixel block 166 is user defined, but preferably includes a 32 x 32 pixel 
array that includes only pixels within image interior 144. Any pixels 168 (indicated 
by cross-hatching) of pixel block 166 falling outside object interior 144 as 
determined by function block 158 (e.g., corners 70b and 70c) are omitted. Pixel 
blocks 166 are matched to the corresponding pixel blocks in the next image frame 
according to a minimum absolute error identified by a conventional block match 
process or a polygon match process, as described below in greater detail. 

Function block 170 indicates that a sparse motion transformation of an object 
is determined from the corresponding feature points in two successive image frames. 
Function block 172 indicates that mask 80 of the current image frame is transformed 
according to the sparse motion transformation to provide an estimation of the mask 
80 for the next image frame. Any feature point in a current frame not identified in 
a successive image frame is disregarded. 

Function block 174 indicates that the resulting estimation of mask 80 for the 
next image frame is delayed by one frame, and functions as an outline 176 for a 
next successive cycle. Similarly, function block 178 indicates that the corresponding 
feature points also are delayed by one frame, and utilized as the initial feature points 
1 80 for the next successive frame. 
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POLYGON MATCH METHOD 

Fig. 6 is a functional block diagram of a polygon match process 200 for 
determining a motion vector for each corresponding pair of pixels in successive 
image frames. Such a dense motion vector determination provides the basis for 
5 determining the dense motion transformations 96 of Fig. 3A. 

Polygon match process 200 is capable of determining extensive motion 
between successive image frames like the conventional block match process. In 
contrast to the conventional block match process, however, polygon match process 
200 maintains its accuracy for pixels located near or at an object perimeter and 
10 generates significantly less error, A preferred embodiment of polygon match method 
200 has improved computational efficiency. 

Polygon block method 200 is described with reference to Figs. 7A and 7B, 
which are simplified representations of display screen 50 showing two successive 
image frames 202a and 202b in which an image feature 204 is rendered as objects 
15 204a and 204b, respectively. 

Function block 206 indicates that objects 204a and 204b for image frames 
202a and 202b are identified and segmented by, for example, object segmentation 
method 140. 

Function block 208 indicates that dimensions are determined for a pixel block 
20 210b (e.g., 15x15 pixels) to be applied to object 204b and a search area 212 about 
object 204a. Pixel block 210b defines a region about each pixel in object 204b for 
which region a corresponding pixel block 210a is identified in object 204a. Search 
area 212 establishes a region within which corresponding pixel block 210a is sought. 
Preferably, pixel block 210b and search area 212 are right regular arrays of pixels 
25 and of sizes defined by the user. 

Function block 214 indicates that an initial pixel 216 in object 204b is 
identified and designated the current pixel. Initial pixel 216 may be defined by any 
of a variety of criteria such as, for example, the pixel at the location of greatest 
vertical extent and minimum horizontal extent. With the pixels on display screen 50 
30 arranged according to a coordinate axis 220 as shown, initial pixel 216 may be 

represented as the pixel of object 214b having a maximum y-coordinate value and a 
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minimum x-coordinate value. 

Function block 222 indicates that pixel block 210b is centered at and extends 
about the current pixel. 

Function block 224 represents an inquiry as to whether pixel block 210b 
5 includes pixels that are not included in object 204b (e.g., pixels 226 shown by cross- 
hatching in Fig. 7B). This inquiry is made with reference to the objects identified 
according to function block 206. Whenever pixels within pixel block 210b 
positioned at the current pixel fall outside object 204b, function block 224 proceeds 
to function block 228 and otherwise proceeds to function block 232. 

10 Function block 228 indicates that pixels of pixel block 210b falling outside 

object 204b (e.g., pixels 226) are omitted from the region defined by pixel block 
210b so that it includes only pixels within object 204b. As a result, pixel block 
210b defines a region that typically would be of a polygonal shape more complex 
than the originally defined square or rectangular region. 

15 Function block 232 indicates that a pixel in object 204a is identified as 

corresponding to the current pixel in object 204b. The pixel in object 204a is 
referred to as the prior corresponding pixel. Preferably, the prior corresponding 
pixel is identified by forming a pixel block 210a about each pixel in search area 212 
and determining a correlation between the pixel block 210a and pixel block 210b 

20 about the current pixel in object 204b. Each correlation between pixel blocks 210a 
and 210b may be determined, for example, by an absolute error. The prior 
corresponding pixel is identified by identifying the pixel block 210a in search area 
212 for which the absolute error relative to pixel block 210b is minimized. A 
summed absolute error E for a pixel block 210a relative to pixel block 210b may be 

25 determined as: 

E = 2:Z(|vr,'| + |g;j-g,;| + |b,-b,'|), 

i=0 j=0 

in which the terms r^j, g--. and bjj correspond to the respective red, green, and blue 
30 color components associated with each of the pixels in pixel block 210b and the 
terms r^^\ gy', and h^f correspond to the respective red, green, and blue color 
components associated with each of the pixels in pixel block 210a. 
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As set forth above, the summations for the absolute error E imply pixel 
blocks having pixel arrays having mxn pixel dimensions. Pixel blocks 2 1 Ob of 
polygonal configuration are accommodated relatively simply by, for example, 
defining zero values for the color components of all pixels outside polygonal pixel 
blocks 2 1 Ob. 

Function block 234 indicates that a motion vector MV between each pixel in 
object 204b and the corresponding prior pixel in object 204a is determined. A 
motion vector is defined as the difference between the locations of the pixel in 
object 204b and the corresponding prior pixel in object 204a: 
MV = (x,-x,\ y^-y,'), 

in which the terms Xj and y^ correspond to the respective x- and y-coordinate 
positions of the pixel in pixel block 210b, and the terms Xj.' and y,'correspond to the 
respective x- and y-coordinate positions of the corresponding prior pixel in pixel 
block 210a. 

Function block 236 represents an inquiry as to whether object 204b includes 
any remaining pixels. Whenever object 204b includes remaining pixels, function 
block 236 proceeds to function block 238 and otherwise proceeds to end block 240. 

Function block 238 indicates that a next pixel in object 204b is identified 
according to a predetermined format or sequence. With the initial pixel selected as 
described above in reference to function block 214, subsequent pixels may be 
defined by first identifying the next adjacent pixel in a row (i.e., of a common y- 
coordinate value) and, if object 204 includes no other pixels in a row, proceeding to 
the first or left-most pixel (i.e., of minimum x-coordinate value) in a next lower 
row. The pixel so identified is designated the current pixel and function block 238 
remrns to function block 222. 

Polygon block method 200 accurately identifies corresponding pixels even if 
they are located at or near an object perimeter. A significant source of error in 
conventional block matching processes is eliminated by omitting or disregarding 
pixels of pixel blocks 210b falling outside object 204b. Conventional block 
matching processes rigidly apply a uniform pixel block configuration and are not 
applied with reference to a segmented object. The uniform block configurations 
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cause significant errors for pixels adjacent the perimeter of an object because the 
pixels outside the object can undergo significant changes as the object moves or its 
background changes. With such extraneous pixel variations included in conventional 
block matching processes, pixels in the vicinity of an object perimeter cannot be 
5 correlated accurately with the corresponding pixels in prior image frames. 

For each pixel in object 204b, a corresponding prior pixel in object 204a is 
identified by comparing pixel block 210b with a pixel block 210a for each of the 
pixels in prior object 204a. The corresponding prior pixel is the pixel in object 204a 
having the pixel block 210a that best correlates to pixel block 210b. If processed in 
10 a conventional manner, such a determination can require substantial computation to 
identify each corresponding prior pixel. To illustrate, for pixel blocks having 
dimensions of nxn pixels, which are significantly smaller than a search area 212 
having dimensions of mxm pixels, approximately n^xm" calculations are required to 
identify each corresponding prior pixel in the prior object 204a. 

15 

PIXEL BLOCK CORRELATION PROCESS 

Fig. 8 is a functional block diagram of a modified pixel block correlation 
process 260 that preferably is substituted for the one described with reference to 
function block 232. Modified correlation process 260 utilizes redundancy inherent 

20 in correlating pixel blocks 210b and 210a to significantly reduce the number of 
calculations required. 

Correlation process 260 is described with reference to Figs. 9A-9G and lOA- 
lOG, which schematically represent arbitrary groups of pixels corresponding to 
successive image frames 202a and 202b. In particular. Fig. 9A is a schematic 

25 representation of a pixel block 262 having dimensions of 5x5 pixels in which each 
letter corresponds to a different pixel. The pixels of pixel block 262 are arranged as 
a right regular array of pixels that includes distinct columns 264. Fig. 9B represents 
an array of pixels 266 having dimensions of qxq pixels and corresponding to a 
search area 212 in a prior image frame 202a, Each of the numerals in Fig. 9B 

30 represents a different pixel. Although described with reference to a conventional 
right regular pixel block 262, correlation process 260 is similarly applicable to 
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polygonal pixel blocks of the type described with reference to polygon match 
process 200. 

Function block 268 indicates that an initial pixel block (e.g., pixel block 262) 
is defined with respect to a central pixel M and scanned across a search area 212 
5 (e.g., pixel array 266) generally in a raster pattern (partly shown in Fig. 7A) as in a 
conventional block match process. Figs. 9C-9G schematically illustrate five of the 
approximately q- steps in the block matching process between pixel block 262 and 
pixel array 266. 

Although the scanning of pixel block 262 across pixel array 266 is performed 

10 in a conventional manner, computations relating to the correlation between them are 
performed differently according to this invention. In particular, a correlation (e.g., 
an absolute error) is determined and stored for each column 264 of pixel block 262 
in each scan position. The correlation that is determined and stored for each column 
264 of pixel block 262 in each scanned position is referred to as a column 

15 correlation 270, several of which are symbolically indicated in Figs. 9C-9G by 

referring to the correlated pixels. To illustrate, Fig. 9C shows a column correlation 
270(1) that is determined for the single column 264 of pixel block 262 aligned with 
pixel array 266. Similarly, Fig. 9D shows column correlations 270(2) and 270(3) 
that are determined for the two columns 264 of pixel block 262 aligned with pixel 

20 array 266. Figs. 9E-9G show similar column correlations with pixel block 262 at 
three exemplary subsequent scan positions relative to pixel array 266. 

The scanning of initial pixel block 262 over pixel array 266 provides a stored 
array or database of column correlations. With pixel block 262 having r-number of 
columns 264, and pixel array 266 having qxq pixels, the column correlation database 

25 includes approximately rq- number of column correlations. This number of column 
correlations is only approximate because pixel block 262 preferably is initially 
scanned across pixel array 266 such that pixel M is aligned with the first row of 
pixels in pixel array 266. 

The remaining steps beginning with the one indicated in Fig. 9C occur after 

30 two complete scans of pixel block 262 across pixel array 266 (i.e., with pixel M 
aligned with the first and second rows of pixel array 266). 
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Function block 274 indicates that a next pixel block 276 (Fig. lOA) is 
defined from, for example, image frame 202b with respect to a central pixel N in the 
same row as pixel M. Pixel block 276 includes a column 278 of pixels not included 
in pixel block 262 and columns 280 of pixels included in pixel block 262, Pixel 
5 block 276 does not include a column 282 (Fig. 9A) that was included in pixel block 
262. Such an incremental definition of next pixel block 276 is substantially the 
same as that used in conventional block matching processes. 

Function block 284 indicates that pixel block 276 is scanned across pixel 
array 266 in the manner described above with reference to function block 268. As 
10 with Figs. 9C-9G, Figs. lOB-lOG represent the scanning of pixel block 276 across 
pixel array 266. 

Function block 286 indicates that for column 278 a column correlation is 
determined and stored at each scan position. Accordingly, column correlations 
288(1 )-288(5) are made with respect to the scanned positions of column 278 shown 

15 in respective Figs. lOB-lOF. 

Function block 290 indicates that for each of columns 280 in pixel block 276 
a stored column determination is retrieved for each scan position previously 
computed and stored in function block 268. For example, column correlation 270(1) 
of Fig. 9C is the same as column correlation 270'(1) of Fig. IOC. Similarly, column 

20 correlations 270'(2), 270X3), 270'(5)-270X8). and 270'(15)-270'(18) of Figs. lOD- 
lOF are the same as the corresponding column correlations in Figs. 9D, 9E, and 9G. 
For pixel block 276, therefore, only one column correlation 288 is calculated for 
each scan position. As a result, the number of calculations required for pixel block 
276 is reduced by nearly 80 percent. 

25 Function block 292 indicates that a subsequent pixel block 294 (Fig. 1 1 A) is 

defined with respect to a central pixel R in the next successive row relative to pixel 
M, Pixel block 294 includes columns 296 of pixels that are similar to but distinct 
from columns 264 of pixels in pixel block 262 of Fig. 9A. In particular, columns 
296 include pixels A'-E' not included in columns 264. Such an incremental 

30 definition of subsequent pixel block 294 is substantially the same as that used in 
conventional block matching processes. 
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Function block 298 indicates that pixel block 294 is scanned across pixel 
array 266 (Fig. 9B) in the manner described above with reference to function blocks 
268 and 276. Figs. 1 IB- II F represent the scanning of pixel block 294 across pixel 
array 266, 

5 Function block 300 indicates that a column correlation is determined and 

stored for each of columns 296. Accordingly, column correlations 302(1)-302(18) 
are made with respect to the scanned positions of columns 296 shown in Figs. IIB- 
IIF. 

Each of column correlations 302(I)-302(1 8) may be calculated in an 
10 abbreviated manner with reference to column correlations made with respect to pixel 
block 262 (Fig. 9A). 

For example, column correlations 302(4)-302(8) of Fig. 1 ID include 
subcolumn correlations 304'(4)-304X8) that are the same as subcolumn correlations 
304(4)-304(8) of Fig. 9E. Accordingly, column correlations 302(4)-302(8) may be 

15 determined from respective column correlations 270(4)-270(8) by subtracting from 
the latter correlation values for pixels 01 A, 023, 03C, 04D, and 05E to form 
subcolumn correlations 304(4)-304(8), respectively. Column correlations 302(4)- 
302(8) may be obtained by adding correlation values for the pixel pairs 56A', 57B', 
58C\ 59D' and 50E' to the respective subcolumn correlation values 304(4)-304(8), 

20 respectively. 

The determination of column correlations 302(4)-302(8) from respective 
column correlations 270(4)-270(8) entails subtracting individual pi.xel correlation 
values corresponding to the row of pixels A-E of pixel block 262 not included in 
pixel block 294, and adding pixel correlation values for the row of pixels A'-E' 

25 included in pixel block 294 but not pixel block 262. This method substitutes for 

each of column correlations 302(4)-302(8), one substraction and one addition for the 
five additions that would be required to determine each column correlation in a 
conventional manner. With pixel blocks of larger dimensions as are preferred, the 
improvement of this method over conventional calculation methods is even greater. 

30 Conventional block matching processes identify only total block correlations for each 
scan position of initial pixel block 262 relative to pixel array 266. As a 
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consequence, all correlation values for all pixels must be calculated separately for 
each scan position. In contrast, correlation process 260 utilizes stored column 
correlations 270 to significantly reduce the number of calculations required. The 
improvements in speed and processor resource requirements provided by correlation 
5 process 260 more than offset the system requirements for storing the column 
correlations. 

It will be appreciated that correlation process 260 has been described with 
reference to Figs. 9-11 to illustrate specific features of this invention. As shown in 
the illustrations, this invention includes recurring or cyclic features that are 
10 particularly suited to execution by a computer system. These recurring or cyclic 

features are dependent upon the dimensions of pixel blocks and pixel arrays and are 
well understood and can be implemented by persons skilled in the art. 

MULTI-DIMENSIONAL TRANSFORMATION 

15 Fig. 12 is a functional block diagram of a transformation method 350 that 

includes generating a multi-dimensional transformation between objects in first and 
second successive image frames and quantitizing the mapping for transmission or 
storage. The multi-dimensional transformation preferably is utilized in connection 
with function block 96 of Fig. 3. Transformation method 350 is described with 

20 reference to Fig. 7A and Fig. 13, the latter of which like Fig. 7B is a simplified 
representation of display screen 50 showing image frame 202b in which image 
feature 204 is rendered as object 204b. 

Transformation method 350 preferably provides a multi-dimensional affine 
transformation capable of representing complex motion that includes any or all of 

25 translation, rotation, magnification, and shear. Transformation method 350 provides 
a significant improvement over conventional video compression methods such a 
MPEG-1, MPEG-2, and H.26X, which are of only one dimension and represent only 
translation. In this regard, the dimensionality of a transformation refers to the 
number of coordinates in the generalized form of the transformation, as described 

30 below in greater detail. Increasing the accuracy with which complex motion is 

represented according to this invention results in fewer errors than by conventional 
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representations, thereby increasing compression efficiency. 

Function block 352 indicates that a dense motion estimation of the pixels in 
objects 204a and 204b is determined. Preferably, the dense motion estimation is 
obtained by polygon match process 200. As described above, the dense motion 
5 estimation includes motion vectors between pixels at coordinates (x„ y,) in object 
204b of image frame 202b and corresponding pixels at locations (x^\ y-") of object 
204a in image frame 202a. 

Function block 354 indicates that an array of transformation blocks 356 is 
defined to encompass object 204b. Preferably, transformation blocks 356 are right 
10 regular arrays of pixels having dimensions of, for example, 32x32 pixels. 

Function block 358 indicates that a multi-dimensional affme transformation is 
generated for each transformation block 356. Preferably, the affine transformations 
are of first order and represented as: 
Xj'=aXj+byi+c 
15 y/=dXi+eyi+f, 

and are determined with reference to all pixels for which the motion vectors have a 
relatively high confidence. These affme transformations are of two dimensions in 
that Xj and y; are defined relative to two coordinates: x^ and y-^. 

The relative confidence of the motion vectors refers to the accuracy with 
20 which the motion vector between corresponding pixels can be determined uniquely 
relative to other pixels. For example, motion vectors between particular pixels that 
are in relatively large pixel arrays and are uniformly colored (e.g., black) cannot 
typically be determined accurately. In particular, for a black pixel in a first image 
frame, many pixels in the pixel array of the subsequent image frame will have the 
25 same correlation (i.e., absolute value error between pixel blocks). 

In contrast, pixel arrays in which pixels correspond to distinguishing features 
typically will have relatively high correlations for particular corresponding pixels in 
successive image frames. 

The relatively high correlations are preferably represented as a minimal 
30 absolute value error determination for particular pixel. Motion vectors of relatively 
high confidence may, therefore, be determined relative to such uniquely low error 
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values. For example, a high confidence motion vector may be defined as one in 
vv^hich the minimum absolute value error for the motion vector is less than the next 
greater error value associated with the pixel by a difference amount that is greater 
than a threshold difference amount. Alternatively, high confidence motion vectors 
may be defined with respect to the second order derivative of the absolute error 
values upon which the correlations are determined. A second order derivative of 
more than a particular value would indicate a relatively high correlation between 
specific corresponding pixels. 

With n-number of pixels with such high-confidence motion vectors, the 
preferred affine transformation equations are solved with reference to n-number of 
corresponding pixels in image frames 202a and 202b. Images frames must include 
at least three corresponding pixels in image frames 202a and 202b with high 
confidence motion vectors to solve for the six unknown coefficients a, b, c, d, e, and 
f of the preferred affine transformation equations. With the preferred dimensions, 
each of transformation blocks 356 includes 2'^ pixels of which significant numbers 
typically have relatively high confidence motion vectors. Accordingly, the affine 
transformation equations are over-determined in that a significantly greater number 
of pixels are available to solve for the coefficients a, b, c, d, e, and f 

The resulting n-number of equations may be represented by the linear 
algebraic expression: 
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Preferably these equations are solved by a conventional singular value 
decomposition (SVD) method, which provides a minimal least-square error for the 
approximation of the dense motion vectors. A conventional SVD method is 
described, for example, in Numerical Recipes in C , by Press et ah, Cambridge 

10 University Press, (1992). 

As described above, the preferred two-dimensional affme transformation 
equations are capable of representing translation, rotation, magnification, and shear 
of transformation blocks 356 between successive image frames 202a and 202b. In 
contrast, conventional motion transformation methods used in prior compression 

15 standards employ simplified transformation equations of the form: 
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The prior simplified transformation equations represent motion by only two 
coefficients, g and h, which represents only one- third the amount of information 
(i.e., coefficients) obtained by the preferred multi-dimensional transformation 
equations. To obtain superior compression of the information obtained by 
transformation method 350 relative to conventional compression methods, the 
dimensions of transformation block 356 preferably are more than three times larger 
than the corresponding 16x16 pixel blocks employed in MPEG-1 and MPEG-2 
compression methods. The preferred 32x32 pixel dimensions of transformation 
blocks 356 encompass four times the number of pixels employed in the 
transformation blocks of conventional transformation methods. The larger 
dimensions of transformation blocks 356, together with the improved accuracy with 
which the affine transformation coefficients represent motion of the transformation 
blocks 356, allow transformation method 350 to provide greater compression than 
conventional compression methods. 

It will be appreciated that the affine coefficients generated according to the 
present invention typically would be non-integer, floating point values that could be 
difficult to compress adequately without adversely affecting their accuracy. 
Accordingly, it is preferable to quantize the affine transformation coefficient to 
reduce the bandwidth required to store or transmit them. 

Function block 362 indicates that the affine transformation coefficients 
generated with reference to function block 358 are quantized to reduce the 
bandwidth required to store or transmit them. Fig. 14 is an enlarged fragmentary 
representation of a transformation block 356 showing three selected pixels, 364a, 
364b, and 364c from which the six preferred affine transformation coefficients a-f 
may be determined. 

Pixels 364a-364c are represented as pixel coordinates (x,, y,), (x^, y^), and 
(X3, y3), respectively. Based upon the dense motion estimation of function block 
352, pixels 364a-364c have respective corresponding pixels (X|\ y,'), (y^, yi). (X3', 
y3') in preceding image frame 202a. As is conventional, pixel locations (x^, yj are 
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represented by integer values and are solutions to the affine transformation equations 
upon which the preferred affine transformation coefficients are based. Accordingly, 
selected pixels 364a-364c are used to calculate the corresponding pixels from the 
preceding image frame 202a, which typically will be floating point values. 
5 Quantization of these floating point values is performed by converting to 

integer format the difference between corresponding pixels (Xj-x'„ yry'i)- The affine 
transformation coefficients are determined by first calculating the pixel values (x^ 
y'i) from the difference vectors and the pixel values (Xj, yX and then solving the 
multi-dimensional transformation equations of function block 358 with respect to the 
10 pixel values (x'j, y'j). 

As shown in Fig. 14, pixels 364a-364c preferably are distributed about 
transformation block 356 to minimize the sensitivity of the quantization to local 
variations within transformation block 356. Preferably, pixel 364a is positioned at 
or adjacent the center of transformation block 356, and pixels 364b and 364c are 
15 positioned at upper corners. Also in the preferred embodiment, the selected pixels 
for each of the transformation blocks 356 in object 204b have the same positions, 
thereby allowing the quantization process to be performed efficiently. 

Another aspect of the quantization method of function block 362 is that 
different levels of quantization may be used to represent varying degrees of motion. 
10 As a result, relatively simple motion (e.g., translation) may be represented by fewer 
selected pixels 364 than are required to represent complex motion. With respect to 
the affine transformation equations described above, pixel 364a (x,, y^) from object 
204b and the corresponding pixel (Xj', y,') from object 204a are sufficient to solve 
simplified affine transformation equations of the form: 



which represent translation between successive image frames. Pixel 364a 
specifically is used because its central position generally represents translational 
motion independent of the other types of motion. Accordingly, a user may 
selectively represent simplified motion such as translation with simplified affine 
transformation equations that require one-third the data required to represent 



25 



x,'=y,+c 

y.'=y.+f. 
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complex motion. 

Similarly, a pair of selected pixels (Xj, yi) (e.g., pixel 364a) and (x^, (i.e., 
either of pixels 364b and 364c) from object 204b and the corresponding pixels (x,', 
yi') and (xj', from object 204a are sufficient to solve simplified affme 
transformation equations of the form: 



which are capable of representing motions that include translation and magnification 
between successive image frames. In the simplified form: 

x' =acos0x+sin9y+c 

y'— sin9x+acos9y+f 

the corresponding pairs of selected pixels are capable of representing motions that 
include translation, rotation, and isotropic magnification. In this simplified form, the 
common coefficients of the x and y variables allow the equations to be solved by 
two corresponding pairs of pixels. 

Accordingly, a user may selectively represent moderately complex motion 
that includes translation, rotation, and magnification with partly simplified affine 
transformation equations. Such equations would require two-thirds the data required 
to represent complex motion. Adding the third selected pixel (X3, y3) from object 
204b, the corresponding pixel y^") from object 204a, and the complete preferred 
affme transformation equations allows a user also to represent shear between 
successive image frames. 

A preferred embodiment of transformation method 350 (Fig. 12) is described 
as using uniform transformation blocks 356 having dimensions of, for example, 
32x32 pixels. The preferred multi-dimensional affine transformations described with 
reference to function block 358 are determined with reference to transformation 
blocks 356. It will be appreciated that the dimensions of transformation blocks 356 
directly affect the compression ratio provided by this method. 

Fewer transformation blocks 356 of relatively large dimensions are required 
to represent transformations of an object between image frames than the number of 
transformation blocks 356 having smaller dimensions. A consequence of uniformly 



Xj'=aXi+c 
yi =eyi+f. 
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large transformation blocks 356 is that correspondingly greater error can be 
introduced for each transformation block. Accordingly, uniformly sized 
transformation blocks 356 typically have moderate dimensions to balance these 
conflicting performance constraints. 

TRANSFORMATION BLOCK OPTIMIZATION 

Fig. 15 is a functional block diagram of a transformation block optimization 
method 370 that automatically selects transformation block dimensions that provide a 
minimal error threshold. Optimization method 370 is described with reference to 
Fig. 16, which is a simplified representation of display screen 50 showing a portion 
of image frame 202b with object 204b. 

Function block 372 indicates that an initial transformation block 374 is 
defined with respect to object 204b. Initial transformation block 374 preferably is of 
maximal dimensions that are selectable by a user and are, for example, 64x64 pixels. 
Initial transformation block 374 is designated the current transformation block. 

Function block 376 indicates that a current peak signal-to-noise ratio (SNR) 
is calculated with respect to the current transformation block. The signal-to-noise 
ratio preferably is calculated as the ratio of the variance of the color component 
values of the pixel within the current transformation block (i.e., the signal) to the 
variance of the color components values of the pixels associated with estimated error 
110 (Fig. 3). 

Function block 378 indicates that the current transformation block (e.g., 
transformation block 374) is subdivided into, for example, four equal sub-blocks 
380a-380d, affine transformations are determined for each of sub-blocks 380a-380d, 
and a future signal-to-noise ratio is determined with respect to the affine 
transformations. The future signal-to-noise ratio is calculated in substantially the 
same manner as the current signal-to-noise ratio described with reference to function 
block 376. 

Inquiry block 382 represents an inquiry as to whether the future signal-to- 
noise ratio is greater than the current signal-to-noise ratio by more than a user- 
selected threshold amount. This inquiry represents a determination that further 
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subdivision of the current transformation block (e.g., transformation block 374) 
would improve the accuracy of the affine transformations by at least the threshold 
amount. Whenever the future signal-to-noise ratio is greater than the current signal- 
to-noise ratio by more than the threshold amount, inquiry block 382 proceeds to 
function block 384, and otherwise proceeds to function block 388. 

Function block 384 indicates that sub-blocks 380a-380d are successively 
designated the current transformation block, and each are analyzed whether to be 
further subdivided. For purposes of illustration, sub-block 380a is designated the 
current transformation and processed according to function block 376 and further 
sub-divided into sub-blocks 386a-386d. Function block 388 indicates that a next 
successive transformation block 374' is idendfied and designated an initial or current 
transformation block. 

PRECOMPRESSION EXTRAPOLATION METHOD 

Figs. 17A and B are a functional block diagram of a precompression 
extrapolation method 400 for extrapolating image features of arbitrary configuration 
to a predefined configuration to facilitate compression in accordance with function 
block 112 of encoder process 64 (both of Fig. 3). Extrapolation method 400 allows 
the compression of function block 1 12 to be performed in a conventional manner 
such as DCT or lattice or other wavelet compression, as described above. 

Conventional still image compression methods such as lattice or other 
wavelet compression or discrete cosine transforms (DCT) operate upon rectangular 
arrays of pixels. As described above, however, the methods of the present invention 
are applicable to image features or objects of arbitrary configuration. Extrapolating 
such objects or image features to a rectangular pixel array configuration allows use 
of conventional still image compression methods such as lattice or other wavelet 
compression or DCT. Extrapolation method 400 is described below with reference 
to Figs. 18A-18D, which are representations of display screen 50 on which a simple 
object 402 is rendered to show various aspects of extrapolation method 400. 

Function block 404 indicates that an extrapolation block boundary 406 is 
defined about object 402. Extrapolation block boundary 406 preferably is 
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rectangular. Referring to Fig. 18 A, the formation of extrapolation block boundary 
406 about object 402 is based upon an identification of a perimeter 408 of object 
402 by, for example, object segmentation method 140 (Fig. 4). Extrapolation block 
boundary 406 is shown encompassing object 402 in its entirety for purposes of 
illustration. It will be appreciated that extrapolation block boundary 406 could 
alternatively encompass only a portion of object 402. As described with reference to 
object segmentation method 140, pixels included in object 402 have color component 
values that differ from those of pixels not included in object 402. 

Function block 410 indicates that all pixels 412 bounded by extrapolation 
block boundary 406 and not included in object 402 are assigned a predefined value 
such as, for example, a zero value for each of the color components. 

Function block 414 indicates that horizontal lines of pixels within 
extrapolation block boundary 406 are scanned to identify horizontal lines with 
horizontal pixel segments having both zero and non-zero color component values. 

Function block 416 represents an inquiry as to whether the horizontal pixel 
segments having color component values of zero are bounded at both ends by 
perimeter 408 of object 402. Referring to Fig. 18B, region 418 represents horizontal 
pixel segments having color component values of zero that are bounded at both ends 
by perimeter 408. Regions 420 represent horizontal pixel segments that have color 
component values of zero and are bounded at only one end by perimeter 408. 
Function block 416 proceeds to function block 426 for regions 418 in which the 
pixel segments have color component values of zero bounded at both ends by 
perimeter 408 of object 402, and otherwise proceeds to function block 422. 

Function block 422 indicates that the pixels in each horizontal pixel segment 
of a region 420 is assigned the color component values of a pixel 424 (only 
exemplary ones shown) in the corresponding horizontal lines and perimeter 408 of 
object 402. Alternatively, the color component values assigned to the pixels in 
regions 420 are functionally related to the color component values of pixels 424. 

Function block 426 indicates that the pixels in each horizontal pixel segment 
in region 418 are assigned color component values corresponding to, and preferably 
equal to, an average of the color component values of pixels 428a and 428b that are 
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in the corresponding horizontal lines and on perimeter 408. 

Function block 430 indicates that vertical lines of pixels within extrapolation 
block boundary 406 are scanned to identify vertical lines with vertical pixel 
segments having both zero and non-zero color component values. 
5 Function block 432 represents an inquiry as to whether the vertical pixel 

segments in vertical lines having color component values of zero are bounded at 
bodi ends by perimeter 408 of object 402. Referring to Fig. ISC, region 434 
represents vertical pixel segments having color component values of zero that are 
bounded at both ends by perimeter 408. Regions 436 represent vertical pixel 
10 segments that have color component values of zero and are bounded at only one end 
by perimeter 408. Function block 432 proceeds to function block 444 for region 
434 in which the vertical pixel segments have color component values of zero 
bounded at both ends by perimeter 408 of object 402, and otherwise proceeds to 
function block 438. 

15 Function block 438 indicates that the pixels in each vertical pixel segment of 

region 436 are assigned the color component values of pixels 442 (only exemplary 
ones shown) in the vertical lines and perimeter 408 of object 402. Alternatively, the 
color component values assigned to the pixels in region 436 are functionally related 
to the color component values of pixels 442. 
20 Function block 444 indicates that the pixels in each vertical pixel segment in 

region 434 are assigned color component values corresponding to, and preferably 
equal to, an average of the color component values of pixels 446a and 446b that are 
in the horizontal lines and on perimeter 408. 

Function block 448 indicates that pixels that are in both horizontal and 
25 vertical pixel segments that are assigned color component values according to this 
method are assigned composite color componenc values that relate to, and preferably 
are the average of, the color component values otherwise assigned to the pixels 
according to their horizontal and vertical pixel segments. 

Examples of pixels assigned such composite color component values are 
30 those pixels in regions 418 and 434. 

Function block 450 indicates that regions 452 of pixels bounded by 
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extrapolation block boundary 406 and not intersecting perimeter 408 of object 402 
along a horizontal or vertical line are assigned composite color component values 
that are related to, and preferably equal to the average of, the color component 
values assigned to adjacent pixels. Referring to Fig. 18D, each of pixels 454 in 
5 regions 452 is assigned a color component value that preferably is the average of the 
color component values of pixels 456a and 456b that are aligned with pixel 454 
along respective horizontal and vertical lines and have non-zero color component 
values previously assigned by this method. 

A benefit of object extrapolation process 400 is that is assigns smoothly 

10 varying color component values to pixels not included in object 402 and therefore 
optimizes the compression capabilities and accuracy of conventional still image 
compression methods. In contrast, prior art zero padding or mirror image methods, 
as described by Chang et al., 'Transform Coding of Arbitrarily-Shaped Image 
Segments," ACM Multimedia, pp. 83-88, June, 1993, apply compression to 

15 extrapolated objects that are filled with pixels having zero color components values 
such as those applied in function block 410. The drastic image change than occurs 
between an object and the zero-padded regions introduces high frequency changes 
that are difficult to compress or introduce image artifacts upon compression. Object 
extrapolation method 400 overcomes such disadvantages, 

20 

ALTERNATIVE ENCODER METHOD 

Fig. 19A is a functional block diagram of an encoder method 500 that 
employs a Laplacian pyramid encoder with unique filters that maintain nonlinear 
aspects of image features, such as edges, while also providing high compression. 

25 Conventional Laplacian pyramid encoders are described, for example, in the 

Laplacian Pyramid as a Compact Image Code by Burt and Addleson, IEEE Trans. 
Comm., Vol. 31, No. 4, pp. 532-540, April 1983. Encoder method 500 is capable 
of providing the encoding described with reference to function block 112 of video 
compression encoder process 64 shown in Fig, 3, as well as whenever else DCT on 

30 wavelet encoding is suggested or used. By way of example, encoder method 500 is 
described with reference to encoding of estimated error 110 (Fig. 3). 
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A first decimation filter 502 receives pixel information corresponding to an 
estimated error 1 10 (Fig. 3) and filters the pixels according to a filter criterion. In a 
conventional Laplacian pyramid method, the decimation filter is a low-pass filter 
such as a Gaussian weighting fiinction. In accordance with encoder method 500, 
5 however, decimation filter 502 preferably employs a median filter and, more 
specifically, a 3x3 nonseparable median filter. 

To illustrate. Fig. 20A is a simplified representation of the color component 
values for one color component (e.g., red) for an arbitrary set or array of pixels 504. 
Although described with particular reference to red color component values, this 
10 illustration is similarly applied to the green and blue color component values of 
pixels 504. 

With reference to the preferred embodiment of decimation filter 502, filter 
blocks 506 having dimensions of 3x3 pixels are defined among pixels 504. For each 
pixel block 506, the median pixel intensity value is identified or selected. With 
15 reference to pixel blocks 506a-506c, for example, decimation filter 502 provides the 
respective values of 8, 9, and 10, which are listed as the first three pixels 512 in Fig. 
20B. 

It will be appreciated, however, that decimation filter 502 could employ other 
median filters according to this invention. Accordingly, for each group of pixels 
20 having associated color component values of {a^, aj, . . ., a^.J the median filter 
would select a median value a^^. 

A first 2x2 down sampling filter 514 samples alternate pixels 512 in vertical 
and horizontal directions to provide additional compression. Fig. 20C represents a 
resulting compressed set of pixels 515. 
25 A 2x2 up sample filter 516 inserts a pixel of zero value in place of each 

pixel 512 omitted by down samphng filter 514, and interpolation filter 518 assigns 
to the zero-value pixel a pixel value of an average of the opposed adjacent pixels, or 
a previous assigned value if the zero-value pixel is not between an opposed pair of 
non-zero value pixels. To illustrate. Fig. 20D represents a resulting set or array of 
30 value pixels 520. 

A difference 522 is taken between the color component values of the set of 
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pixels 504 and the corresponding color component values for set of pixels 520 to 
form a zero-order image component Iq. 

A second decimation filter 526 receives color component values 
corresponding to the compressed set of pixels 515 generated by first 2x2 down 
5 sampling filter 514. Decimation filter 526 preferably is the same as decimation filter 
502 (e.g., a 3x3 nonseparable median filter). Accordingly, decimation filter 526 
functions in the same maimer as decimation filter 502 and delivers a resulting 
compressed set or array of pixels (not shown) to a second 2x2 down sampling filter 
528. 

10 Down sampling filter 528 functions in the same manner as down sampling 

filter 514 and forms a second order image component L. that also is delivered to a 
2x2 up sample filter 530 and an interpolation filter 531 that function in the same 
manner as up sample filter 516 and interpolation filter 518, respectively. A 
difference 532 is taken between the color component values of the set of pixels 515 

15 and the resulting color component values provided by interpolation filter 531 to form 
a first-order image component I,. 

The image components Iq, Ij, and are respective 

n n n n 
nxn, —X—, — X— 

2 2 4 4 

sets of color component values that represent the color component values for 
20 an nxn array of pixels 504. 

Image component Iq maintains the high frequency components (e.g., edges) of 
an image represented by the original set of pixel 504. Image components I, and L2 
represent low frequency aspects of the original image. Image components Iq, I, and 
L2 provide relative compression of the original image. Image component lo and I, 
25 maintain high frequency features (e.g., edges) in a format that is highly compressible 
due to the relatively high correlation between the values of adjacent pixels. Image 
component L2 is not readily compressible because it includes primarily low 
frequency image features, but is a set of relatively small size. 

Fig. 19B is a functional block diagram of a decoder method 536 that decodes 
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or inverse encodes image components Iq, Ij, and generated by encoder method 
500. Decoder method 536 includes a first 2x2 up sample filter 538 that receives 
image component L2 and interposes a pixel of zero value between each adjacent pair 
of pixels. An interpolation filter 539 assigns to the zero-value pixel a pixel value 
5 that preferably is an average of the values of the adjacent pixels, or a previous 
assigned value if the zero-value pixel is not between an opposed pair of non-zero- 
value pixels. First 2x2 up sample filter 538 operates in substantially the same 
manner as up sample filters 516 and 530 of Fig. 19A, and interpolation filter 539 
operates in substantially the same manner as interpolation filters 518 and 531. 

10 A sum 540 is determined between image component I, and the color 

component values corresponding to the decompressed set of pixels generated by first 
2x2 up sample filter 538 and interpolation filter 539. A second 2x2 up sample filter 
542 interposes a pixel of zero value between each adjacent pair of pixels generated 
by sum 540. An interpolation filter 543 assigns to the zero-value pixel a pixel value 

15 that includes an average of the values of the adjacent pixels, or a previous assigned 
value if the zero-value pixel is not between an opposed pair of non-zero-value 
pixels. Up sample filter 542 and interpolation filter 543 are substantially the same 
as up sample filter 538 and interpolation filter 539, respectively, 

A sum 544 sums the image component lo with the color component values 

20 corresponding to the decompressed set of pixels generated by second 2x2 up sample 
filter 542 and interpolation filter 543. Sum 544 provides decompressed estimated 
error 1 10 corresponding to the estimated error 1 10 delivered to encoder process 500. 

TRANSFORM CODING OF MOTION VECTORS 

25 Conventional video compression encoder processes, such as MPEG- 1 or 

MPEG-2, utilize only sparse motion vector fields to represent the motion of 
significantly larger pixel arrays of a regular size and configuration. The motion 
vector fields are sparse in that only one motion vector is used to represent the 
motion of a pixel array having dimensions of, for example, 16 x 16 pixels. The 

30 sparse motion vector fields, together with transform encoding of underlying images 
or pixels by, for example, discrete cosine transform (DCT) encoding, provide 
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conventional video compression encoding. 

In contrast, video compression encoding process 64 (Fig. 3) utilizes dense 
motion vector fields in which motion vectors are determined for all, or virtually all, 
pixels of an object. Such dense motion vector fields significantly improve the 
5 accuracy with which motion between corresponding pixels is represented. Although 
the increased accuracy can significantly reduce the errors associated with 
conventional sparse motion vector field representations, the additional information 
included in dense motion vector fields represent an increase in the amount of 
information representing a video sequence. In accordance with this invention, 

10 therefore, dense motion vector fields are themselves compressed or encoded to 
improve the compression ratio provided by this invention. 

Fig. 21 is a functional block diagram of a motion vector encoding process 
560 for encoding or compressing motion vector fields and, preferably, dense motion 
vector fields such as those generated in accordance with dense motion transformation 

15 96 of Fig. 3. It will be appreciated that such dense motion vector fields from a 
selected object typically will have greater continuity or "smoothness" than the 
underlying pixels corresponding to the object. As a result, compression or encoding 
of the dense motion vector fields will attain a greater compression ratio than would 
compression or encoding of the underlying pixels, 

20 Function block 562 indicates that a dense motion vector field is obtained for 

an object or a portion of an object in accordance with, for example, the processes of 
function block 96 described with reference to Fig. 3. Accordingly, the dense motion 
vector field will correspond to an object or other image portion of arbitrary 
configuration or size. 

25 Function block 564 indicates that the configuration of the dense motion 

vector field is extrapolated to a regular, preferably rectangular, configuration to 
facilitate encoding or compression. Preferably, the dense motion vector field 
configuration is extrapolated to a regular configuration by precompression 
extrapolation method 400 described with reference to Figs. 17A and 17B. It will be 

30 appreciated that conventional extrapolation methods, such as a mirror image method, 
could alternatively be utilized. 
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Function block 566 indicates that the dense motion vector field with its 
extrapolated regular configuration is encoded or compressed according to 
conventional encoding transformations such as, for example, discrete cosine 
transformation (DCT) or lattice or other wavelet compression, the former of which is 
5 preferred. 

Function block 568 indicates that the encoded dense motion vector field is 
further compressed or encoded by a conventional lossless still image compression 
method such as entropy encoding to form an encoded dense motion vector field 570. 
Such a still image compression method is described with reference to function block 
10 114 of Fig. 3. 

COMPRESSION OF QUANTIZED OBJECTS FROM PREVIOUS 
VIDEO FRAMES 

Referring to Fig. 3A, video compression encoder process 64 uses quantized 

15 prior object 126 determined with reference to a prior frame N-1 to encode a 

corresponding object in a next successive frame N. As a consequence, encoder 
process 64 requires that quantized prior object 126 be stored in an accessible 
memory buffer. With conventional video display resolutions, such a memory buffer 
would require a capacity of at least one-half megabyte to store the quantized prior 

20 object 126 for a single video frame. Higher resolution display formats would 
require correspondingly larger memory buffers. 

Fig. 22 is a functional block diagram of a quantized object encoder-decoder 
(codec) process 600 that compresses and selectively decompresses quantized prior 
objects 126 to reduce the required capacity of a quantized object memory buffer. 

25 Function block 602 indicates that each quantized object 126 in an image 

frame is encoded on a block-by-block manner by a lossy encoding or compression 
method such as discrete cosine transform (DCT) encoding or lattice sub-band or 
other wavelet compression. As shown in Fig. 21, lossy encoded information can 
undergo additional lossless encoding. Alternatively, lossless encoding alone can be 

30 used. 

Function block 604 indicates that the encoded or compressed quantized 
objects are stored in a memory buffer (not shown). 
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Function block 606 indicates that encoded quantized objects are retrieved 
from the memory buffer in anticipation of processing a corresponding object in a 
next successive video frame. 

Function block 608 indicates that the encoded quantized object is inverse 
encoded by, for example, DCT or wavelet decoding according to the encoding 
processes employed with respect to function block 602. 

Codec process 600 allows the capacity of the corresponding memory buffer 
to be reduced by up to about 80%, depending upon the overall video compression 
ratio and the desired quality of the resultant video. Moreover, it will be appreciated 
that codec process 600 is similarly applicable to the decoder process corresponding 
to video compression encoder process 64. 



VIDEO COMPRESSION DECODER PROCESS OVERVIEW 

Video compression encoder process 64 of Fig. 3 provides encoded or 
15 compressed representations of video signals corresponding to video sequences of 
multiple image frames. The compressed representations include object masks 66, 
feature points 68, affme transform coefficients 104, and compressed error data 116 
from encoder process 64 and compressed master objects 136 from encoder process 
130. These compressed representations facilitate storage or transmission of video 
20 information, and are capable of achieving compression ratios of up to 300 percent 
greater than those achievable by conventional video compression methods such as 
MPEG-2. 

It will be appreciated, however, that retrieving such compressed video 
information from data storage or receiving transmission of the video information 
25 requires that it be decoded or decompressed to reconstruct the original video signal 
so that it can be rendered by a display device such as video display device 52 (Figs. 
2A and 2B). As with conventional encoding processes such as MPEG-1, MPEG-2, 
and H.26X, the decompression or decoding of the video information is substantially 
the inverse of the process by which the original video signal is encoded or 
30 compressed. 

Fig. 23A is a functional block diagram of a video compression decoder 
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process 700 for decompressing video information generated by video compression 
encoder process 64 of Fig. 3. For purposes of consistency with the description of 
encoder process 64, decoder process 700 is described with reference to Figs. 2A and 
2B. Decoder process 700 retrieves from memory or receives as a transmission 
5 encoded video information that includes object masks 66, feature points 68, 

compressed master objects 136, affme transform coefficients 104, and compressed 
error data 116. 

Decoder process 700 performs operations that are the inverse of those of 
encoder process 64 (Fig. 3). Accordingly, each of the above-described preferred 
10 operations of encoder process 64 having a decoding counterpart would similarly be 
inversed. 

Function block 702 indicates that masks 66, feature points 68, transform 
coefficients 104, and compressed error data 116 are retrieved from memory or 
received as a transmission for processing by decoder process 700. 
15 Fig. 23B is a functional block diagram of a master object decoder process 

704 for decoding or decompressing compressed master object 136. Function block 
706 indicates that compressed master object data 136 are entropy decoded by the 
inverse of the conventional lossless entropy encoding method in function block 134 
of Fig. 3B. Function block 708 indicates that the entropy decoded master object 

20 from function block 706 is decoded according to an inverse of the conventional 
lossy wavelet encoding process used in function block 132 of Fig. 38. 

Function block 712 indicates that dense motion transformations, preferably 
multi-dimensional affme transformations, are generated from affme coefficients 104. 
Preferably, affme coefficients 104 are quantized in accordance with transformation 

25 method 350 (Fig. 12), and the affme transformations are generated from the 

quantized affine coefficients by performing the inverse of the operations described 
with reference to function block 362 (Fig. 12). 

Function block 714 indicates that a quantized form of an object 716 in a 
prior frame N-1 (e.g., rectangular solid object 56a in image frame 54a) provided via 

30 a timing delay 718 is transformed by the dense motion transformation to provide a 
predicted form of the object 720 in a current frame N (e.g., rectangular solid object 
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56b in image frame 54b). 

Function block 722 indicates that for image frame N, predicted current object 
720 is added to a quantized error 724 generated from compressed error data 116. In 
particular, function block 726 indicates that compressed error data 1 16 is decoded by 
an inverse process to that of compression process 1 14 (Fig. 3 A). In the preferred 
embodiment, function blocks 1 14 and 726 are based upon a conventional lossless 
still image compression method such as entropy encoding. 

Function block 728 indicates that the entropy decoded error data from 
function block 726 is further decompressed or decoded by a conventional lossy still 
image compression method corresponding to that utilized in function block 1 12 (Fig. 
3A). In the preferred embodiment, the decompression or decoding of function block 
728 is by a lattice subband or other wavelet process or a discrete cosine transform 
(DCT) process. 

Function block 722 provides quantized object 730 for frame N as the sum of 
predicted object 720 and quantized error 724, representing a reconstructed or 
decompressed object 732 that is delivered to function block 718 for reconstruction of 
the object in subsequent frames. 

Function block 734 indicates that quantized object 732 is assembled with 
other objects of a current image frame N to form a decompressed video signal. 

SIMPLIFIED CHAIN ENCODING 

Masks, objects, sprites, and other graphical features, commonly are 
represented by their contours. As shown in and explained with reference to FIG. 
5A, for example, rectangular solid object 56a is bounded by an object perimeter or 
contour 142. A conventional process of encoding or compressing contours is 
referred to as chain encoding. 

FIG. 24A shows a conventional eight-point chain code 800 from which 
contours on a conventional recta-linear pixel array are defined. Based upon a 
current pixel location X, a next successive pixel location in the contour extends in 
one of directions 802a-802h. The chain code value for the next successive pixel is 
the numeric value corresponding to the particular direction 802. As examples, the 
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right, horizontal direction 802a corresponds to the chain code value O, and the 
downward, vertical direction 802g corresponds to the chain code value 6. Any 
continuous contour can be described from eight-point chain code 800. 

With reference to FIG. 24B, a contour 804 represented by pixels 806 
5 designated X and A-G can be encoded in a conventional manner by the chain code 
sequence {00764432}. In particular, beginning from pixel X, pixels A and B are 
positioned in direction 0 relative to respective pixels X and A. Pixel C is positioned 
in direction 7 relative to pixel B. Remaining pixels D-G are similarly positioned in 
directions corresponding to the chain code values listed above. In a binary 
10 representation, each conventional chain code value is represented by three digital 
bits. 

FIG. 25 A is a functional block diagram of a chain code process 810 of the 
present invention capable of providing contour compression ratios at least about 
twice those of conventional chain code processes. Chain code process 810 achieves 
15 such improved compression ratios by limiting the number of chain codes and 
defining them relative to the alignment of adjacent pairs of pixels. Based upon 
experimentation, it has been discovered that the limited chain codes of chain code 
process 810 directly represent more than 99.8% of pixel alignments of object or 
mask contours. Special case chain code modifications accommodate the remaining 
20 less than 0.2% of pixel alignment as described below in greater detail. 

Function block 816 indicates that a contour is obtained for a mask, object, or 
sprite. The contour may be obtained, for example, by object segmentation process 
140 described with reference to FIGS. 4 and 5. 

Function block 818 indicates that an initial pixel in the contour is identified. 
25 The initial pixel may be identified by common methods such as, for example, a pixel 
with minimal X-axis and Y-axis coordinate positions. 

Function block 820 indicates that a predetermined chain code is assigned to 
represent the relationship between the initial pixel and the next adjacent pixel in the 
contour. Preferably, the predetermined chain code is defined to correspond to the 
30 forward direction. 

FIG. 25B is a diagrammatic representation of a three-point chain code 822 of 



BNSDOCIDr <WO 9713372A3JA> 



wo 97/13372 



PCT/US96/15892 



-48- 

the present invention. Chain code 822 includes three chain codes 824a, 824b, and 
824c that correspond to a forward direction 826a, a leftward direction 826b, and a 
rightward direction 826c, respectfully. Directions 826a-826c are defined relative to 
a preceding alignment direction 828 between a current pixel 830 and an adjacent 
5 pixel 832 representing the preceding pixel in the chain code. 

Preceding alignment direction 828 may extend in any of the directions 802 
shown in Fig. 24 A, but is shown with a specific orientation (i.e., right, horizontal) 
for purposes of illustration. Direction 826a is defined, therefore, as the same as 
direction 828. Directions 826b and 826c differ from direction 828 by leftward and 
10 rightward displacements of one pixel. 

It has been determined experimentally that slightly more than 50% of chain 
codes 824 correspond to forward direction 826a, and slightly less than 25% of chain 
codes 824 correspond to each of directions 826b and 826c. 

Function block 836 represents an inquiry as to whether the next adjacent 
15 pixel in the contour conforms to one of directions 826. Whenever the next adjacent 
pixel in the contour conforms to one of directions 826, function block 836 proceeds 
to function block 838, and otherwise proceeds to function block 840. 

Function block 838 indicates that the next adjacent pixel is assigned a chain 
code 824 corresponding to its direction 826 relative to the direction 828 along which 
20 the adjacent preceding pair of pixels are aligned. 

Function block 840 indicates that a pixel sequence conforming to one of 
directions 826 is substituted for the actual nonconformal pixel sequence. Based 
upon experimentation, it has been determined that such substitutions typically will 
arise in fewer than 0.2% of pixel sequences in a contour and may be accommodated 
25 by one of six special-case modifications. 

FIG. 25C is a diagrammatic representation of the six special-case 
modifications 842 for converting non-conformal pixel sequences to pixel sequences 
that conform to directions 826. Within each modification 842, a pixel sequence 844 
is converted to a pixel sequence 846. In each of pixel sequences 844 of adjacent 
30 respective pixels X', A, B, the direction between pixels A and B does not 
conform to one of directions 826 due to the alignment of pixel A relative to the 
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alignment of pixels X' and X". 

In pixel sequence 844a, initial pixel alignments 850a and 852a represent a 
nonconformal right-angle direction change. Accordingly, in pixel sequence 846a, 
pixel A of pixel sequence 844a is omitted, resulting in a pixel direction 854a that 
conforms to pixel direction 826a. Pixel sequence modifications 842b-842f similarly 
convert nonconformal pixel sequences 844b-844f to conformal sequences 846b- 846f, 
respectively. 

Pixel sequence modifications 842 omit pixels that cause pixel direction 
alignments that change by 90° or more relative to the alignments of adjacent 
preceding pixels XI and X2. One effect is to increase the minimum radius of 
curvature of a contour representing a right angle to three pixels. Pixel modifications 
842 cause, therefore, a minor loss of extremely fine contour detail. According to 
this invention, however, it has been determined that the loss of such details is 
acceptable under most viewing conditions. 

Function block 860 represents an inquiry as to whether there is another pixel 
in the contour to be assigned a chain code. Whenever there is another pixel in the 
contour to be assigned a chain code, function block returns to function block 836, 
and otherwise proceeds to function block 862. 

Function block 862 indicates that nonconformal pixel alignment directions 
introduced or incurred by the process of function block 840 are removed. In a 
preferred embodiment, the nonconformal direction changes may be omitted simply 
by returning to function block 816 and repeating process 810 until no nonconformed 
pixel sequences remain, which typically is achieved in fewer than 8 iterations. In an 
alternative embodiment, such incurred nonconformal direction changes may be 
corrected in "real-time" by checking for and correcting any incurred nonconformal 
direction changes each time a nonconformal direction change is modified. 

Function block 864 indicates that a Huffman code is generated from the 
resulting simplified chain code. With chain codes 824a-824c corresponding to 
directions 826A-826C that occur for about 50%, 25% and 25% of pixels in a 
contour, respective Huffman codes of 0, 11, and 10 are assigned. Such first order 
Huffman codes allow chain process 810 to represent contours at a bit rate of less 
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than 1.5 bits per pixel in the contour. Such a bitrate represents approximately a 
50% compression ratio improvement over conventional chain code processes. 

It will be appreciated that higher order Huffman coding can provide higher 
compression ratios. Higher order Huffman coding includes, for example, assigning 
5 predetermined values to preselected sequences of first order Huffman codes. 

SPRITE GENERATION 

The present invention includes generating sprites for use in connection with 
encoding determinate motion video (movie). Bitmaps are accreted into bitmap series 
that comprise a plurality of sequential bitmaps of sequential images from an image 
10 source. Accretion is used to overcome the problem of occluded pixels where objects 
or figures move relative to one another or where one figure occludes another similar 
to the way a foreground figure occludes the background. For example, when a 
foreground figure moves and reveals some new background, there is no way to build 
that new background from a previous bitmap unless the previous bitmap was first 
15 enhanced by including in it the pixels that were going to be uncovered in the 

subsequent bitmap. This method takes an incomplete image of a figure and looks 
forward in time to find any pixels that belong to the image but are not to be 
immediately visible. Those pixels are used to create a composite bitmap for the 
figure. With the composite bitmap, any future view of the figure can be created by 
20 distorting the composite bitmap. 

The encoding process begins by an operator identifying the figures and the 
parts of the figures of a current bitmap from a current bitmap series. Feature or 
distortion points are selected by the operator on the features of the parts about which 
the parts of the figures move. A current grid of triangles is superimposed onto the 
25 parts of the current bitmap. The triangles that constitute the current grid of triangles 
are formed by connecting adjacent distortion points. The distortion points are the 
vertices of the triangles. The current location of each triangle on the current bitmap 
is determined and stored to the storage device. A portion of data of the current 
bitmap that defines the first image within the current location of each triangle is 
30 retained for further use. 

A succeeding bitmap that defines a second image of the current bitmap series 
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is received from the image source, and the figures and the parts of the figure are 
identified by the operator. Next, the current grid of triangles from the current 
bitmap is superimposed onto the succeeding bitmap. The distortion points of current 
grid of triangles are realigned to coincide with the features of the corresponding 
5 figures on the succeeding bitmap. The realigned distortion points form a succeeding 
grid of triangles on the succeeding bitmap of the second image. The succeeding 
location of each triangle on the succeeding bitmap is determined and stored to the 
storage device. A portion of data of the succeeding bitmap that defines the second 
image within the succeeding location of each triangle is retained for further use. 
10 The process of determining and storing the current and succeeding locations 

of each triangle is repeated for the plurality of sequential bitmaps of the current 
bitmap series. When that process is completed, an average image of each triangle in 
the current bitmap series is determined from the separately retained data. The 
average image of each triangle is stored to the storage device. 
15 During playback, the average image of each triangle of the current bitmap 

series and the current location of each triangle of the current bitmap are retrieved 
from the storage device. A predicted bitmap is generated by calculating a 
transformation solution for transforming the average image of each triangle in the 
current bitmap series to the current location of each triangle of the current bitmap 
20 and applying the transformation solution to the average image of each triangle. The 
predicted bitmap is passed to the monitor for display. 

In connection with a playback determinate motion video (video game) in 
which the images are determined by a controlling program at playback, a sprite 
bitmap is stored in its entirety on a storage device. The sprite bitmap comprises a 
25 plurality of data bits that define a sprite image. The sprite bitmap is displayed on a 
monitor, and the parts of the sprite are identified by an operator and distortion points 
are selected for the sprite's parts. 

A grid of triangles is superimposed onto the parts of the sprite bitmap. The 
triangles that constitute the grid of triangles are formed by connecting adjacent 
30 distortion points. The distortion points are the vertices of the triangles. The 

location of each triangle of the sprite bitmap is determined and stored to the storage 
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During playback, a succeeding location of each triangle is received from a 
controlling program. The sprite bitmap and the succeeding location of each triangle 
on the sprite bitmap are recalled from the storage device and passed to the display 
processor. The succeeding location of each triangle is also passed to the display 
processor. 

A transformation solution is calculated for each triangle on the sprite bitmap. 
A succeeding bitmap is then generated in the display processor by applying the 
transformation solution of each triangle derived from the sprite bitmap the defines 
the sprite image within the location of each triangle. The display processor passes 
the succeeding sprite bitmap to a monitor for display. This process is repeated for 
each succeeding location of each triangle requested by the controlling program. 

As shown in Fig. 26, an encoding procedure for a movie motion video begins 
at step 900 by the CPU 22 receiving from an image source a current bitmap series. 
The current bitmap series comprises a plurality of sequential bitmaps of sequential 
images. The current bitmap series has a current bitmap that comprises a plurality of 
data bits which define a first image from the image source. The first image 
comprises at least one figure having at least one part. 

Proceeding to step 902, the first image is displayed to the operator on the 
monitor 28. From the monitor 28, the figures of the first image on the current 
bitmap are identified by the operator. The parts of the figure on the current bitmap 
are then identified by the operator at step 904. 

Next, at step 906, the operator selects feature or distortion points on the 
current bitmap. The distortion points are selected so that the distortion points 
coincide with features on the bitmap where relative movement of a part is likely to 
occur. It will be understood by those skilled in the art that the figures, the parts of 
the figures and the distortion points on a bitmap may be identified by the computer 
system 20 or by assistance from it. It is preferred, however, that the operator 
identify the figures, the parts of the figures and the distortion points on a bitmap. 

Proceeding to step 908, a current grid of triangles is superimposed onto the 
parts of the current bitmap by the computer system 20. With reference to Fig. 27A, 



3NS0OCID: <WO 9713372A3JA> 



wo 97/13372 



PCT/US96/15892 



-53- 

the current grid comprises triangles formed by connecting adjacent distortion points. 
The distortion points form the vertices of the triangles. More specifically, the first 
image of the cuirent bit map comprises a figure, which is a person 970. The person 
970 has six parts corresponding to a head 972, a torso 974, a right arm 976, a left 
5 arm 978, right leg 980, and a left leg 982. Distortion points are selected on each 
pan of the person 970 so that the distortion points coincide with features where 
relative movement of a part is likely to occur. A current grid is superimposed over 
each part with the triangles of each current grid formed by connecting adjacent 
distortion points. Thus, the distortion points form the vertices of the triangles. 
10 At step 910, the computer system 20 determines a current location of each 

triangle on the current bitmap. The current location of each triangle on the current 
bitmap is defined by the location of the distortion points that form the vertices of the 
triangle. At step 912, the current location of each triangle is stored to the storage 
device, A portion of data derived from the current bitmap that defines the first 
15 image within the current location of each triangle is retained at step 914. 

Next, at step 916, a succeeding bitmap of the current bitmap series is 
received by the CPU 22, The succeeding bitmap comprises a plurality of data bits 
which define a second image of the current bitmap series. The second image may or 
may not include figures that correspond to the figures in the first image. For the 
20 following steps, the second image is assumed to have figures that corresponds to the 
figures in the first image. At step 918, the current grid of triangles is superimposed 
onto the succeeding bitmap. The second image with the superimposed triangular 
grid is displayed to the operator on the monitor 28. 

At step 920, the distortion points are realigned to coincide with 
25 corresponding features on the succeeding bitmap by the operator with assistance 

from the computer system 20. The computer system 20 realigns the distortion using 
block matching. Any mistakes are corrected by the operator. With reference to Fig, 
27B, the realigned distortion points form a succeeding grid of triangles. The 
realigned distortion points are the vertices of the triangles. More specifically, the 
30 second image of the succeeding bitmap of person 200 includes head 972, torso 974, 
right arm 976, left arm 978, right leg 980, and left leg 982. In the second image. 
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however, the right arm 980 is raised. The current grids of the first image have been 
superimposed over each part and their distortion points realigned to coincide with 
corresponding features on the second image. The realigned distortion points define 
succeeding grids of triangles. The succeeding grids comprise triangles formed by 
connecting the realigned distortion points. Thus, the realigned distortion point form 
the vertices of the triangles of the succeeding grids. 

Proceeding to step 922, a succeeding location of each triangle of the 
succeeding bitmap is determined by the computer system 20. At step 924, the 
succeeding location of each triangle on the succeeding bitmap is stored the storage 
device. A portion of data derived from the succeeding bitmap that defines the 
second image within the succeeding location of each triangle is retained at step 926. 
Step 926 leads to decisional step 928 where it is determined if a next succeeding 
bitmap exists. 

If a next succeeding bitmap exists, the YES branch of decisional step 928 
leads to step 930 where the succeeding bitmap becomes the current bitmap. Step 
930 returns to step 916 where a succeeding bitmap of the current bitmap series is 
received by the CPU 22. If a next succeeding bitmap does not exist, the NO branch 
of decisional step 928 leads to step 932 where an average image for each triangle of 
the current bitmap series is determined. The average image is the median value of 
the pixels of a triangle. Use of the average image makes the process less susceptible 
to degeneration. Proceeding to step 934, the average image of each triangle of the 
current bitmap series is stored to the storage device. 

Next, at step 936, the current location of each triangle on the current bitmap 
is retrieved from the storage device. An affine transformation solution for 
transforming the average image of each triangle to the current location of the 
triangle on the current bitmap is then calculated by the computer system 20 at step 
938. At step 940, a predicted bitmap is generated by applying the transformation 
solution of the average image of each triangle to the current location of each triangle 
on the current bitmap. The predicted bitmap is compared with the current bitmap at 
step 942. 

At step 944 a correction bitmap is generated. The corrected bitmap 
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comprises the data bits of the current bitmap that were not accurately predicted by 
the predicted bitmap. The corrected bitmap is stored to the storage device at step 
948. Step 948 leads to decisional step 950 where it is determined if a succeeding 
bitmap exists. 

5 If a succeeding bitmap exists, the YES branch of decisional step 950 leads to 

step 952 where the succeeding bitmap becomes the current bitmap. Step 952 returns 
to step 936 where the current location of each triangle on the current bitmap is 
retrieved from the storage device. If a next succeeding bitmap does not exist, the 
NO branch of decisional step 950 leads to decisional step 954 where it is determined 
10 if a succeeding bitmap series exists. If a succeeding bitmap series does not exist, 

encoding is finished and the NO branch of decisional step 954 leads to step 956. If 
a succeeding bitmap series exists, the YES branch of decisional step 954 leads to 
step 958 where the CPU 22 receives the succeeding bitmap series as the current 
bitmap series. Step 956 returns to step 902 where the figures of the first image of 
15 the current bitmap series is identified by the operator. 

The process of Fig. 26 describes generation of a sprite or master object 90 
for use by encoder process 64 of Fig. 3. The process of utilizing master object 90 
to form predicted objects 102 is described with reference to Fig. 28. 

As shown in Fig. 28, the procedure begins at step 1000 with a current bitmap 
20 series being retrieved. The current bitmap series comprises a plurality of sequential 
bitmaps of sequential images. The current bitmap series has a current bitmap that 
comprises a plurality of data bits which define a first image from the image source. 
The first image comprises at least one figure having at least one part. 

At step 1002, the average image of each triangle of the current bitmap series 
25 is retrieved from the storage device. The average image of each triangle is then 
passed to a display processor (not shown) at step 704. It will be appreciated that 
computer system 20 (Fig. 1) can optionally include a display processor or other 
dedicated components for executing for processes of this invention. Proceeding to 
step 1006, the current location of each triangle on the current bitmap is retrieved 
30 from the storage device. The current location of each triangle is passed to the 
display processor at step 1008. 
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Next, an affine transformation solution for transforming the average image of 
each triangle to the current location of each triangle on the current bitmap is 
calculated by the display processor at step 1010. Proceeding to step 1012, a 
predicted bitmap is generated by the display processor by applying the 
5 transformation solution for transforming the average image of each triangle to the 
current location of each triangle on the current bitmap. 

At step 1014, a correction bitmap for the current bitmap is retrieved from the 
storage device. The correction bitmap is passed to the display processor at step 716. 
A display bitmap is then generated in the display processor by overlaying the 
10 predicted bitmap with the correction bitmap. The display processor retains a copy of 
the average image of each triangle and passes the display bitmap to the frame buffer 
for display on the monitor. 

Next, at decisional step 1020, it is determined if a succeeding bitmap of the 
current bitmap series exists. If a succeeding bitmap of the current bitmap series 
15 exists, the YES branch of decisional step 1020 leads to step 1022. At step 1022, the 
succeeding bitmap becomes the current bitmap. Step 1022 returns to step 1006 
where the location of each triangle on the current bitmap is retrieved from the 
storage device. 

' Returning to decisional step 1020, if a succeeding bitmap of the current 
20 bitmap series does not exist, the NO branch of decisional step 1020 leads to 

decisional step 1024. At decisional step 1024, it is determined if a succeeding 
bitmap series exists. If a succeeding bitmap series does not exist, then the process is 
finished and the NO branch of decisional step 1024 leads to step 1026. If a 
succeeding bitmap series exists, the YES branch of decisional step 1024 leads to step 
25 1028. At step 1028, the succeeding bitmap series becomes the current bitmap series. 
Step 1028 returns to step 1000. 

Having illustrated and described the principles of the present invention in a 
preferred embodiment, it should be apparent to those skilled in the art that the 
embodiment can be modified in arrangement and detail without departing from such 
30 principles. Accordingly, we claim as our invention all such embodiments as come 
within the scope and spirit of the following claims and equivalents thereto. 
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WE CLAIM: 

1 . A method of encoding in a compressed fomiat information within a video 
image frame sequence having first and second video image frames that include an 
arbitrary image feature with an arbitrary configuration, the arbitrary image feature 

5 having different attributes in the first and second video image frames, the method 
comprising: 

determining a dense motion transformation between the arbitrary image 
feature in the first and second video image frames to determine an estimated 
arbitrary image feature in the second video image frame; and 
10 identifying a difference between the estimated arbitrary image feature in the 

second video image frame and the arbitrary image feature in the second video image 
frame to determine a transform error for the arbitrary image feature. 

2. The method of claim 1 in which the arbitrary image feature includes as 
attributes a position, an orientation, and a configuration in the each of the first and 

15 second video image frames and the difference between the attributes of the arbitrary 
image feature in the first and second video image frames includes a difference in at 
least one of the position, orientation, or configuration. 

3. The method of claim 1 further comprising applying the transform error to 
the estimated arbitrary image feature in the second video image frame to form a 

20 corrected image feature in the second video image frame. 

4. The method of claim 3 in which the video image frame sequence further 
includes a third video image frame that includes the arbitrary image feature and the 
method further comprises determining a dense motion transformation between the 
corrected image feature in the second video image frame and the arbitrary image 

25 feature in the third video image frame to determine an estimated arbitrary image 
feature in the third video image frame. 

5. The method of claim 1 in which image features in the video image frame 
sequence are formed from plural pixels and in which determining the dense motion 
transformation between the arbitrary image feature in the first and second video 

30 image frames includes identifying corresponding pixels of the arbitrary image feature 
in the first and second video image frames. 
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6. The method of claim 5 further comprising identifying all corresponding 
pixels of the arbitrary image feature in the first and second video image frames. 

7. The method of claim 1 in which determining the dense motion 
transformation between the arbitrary image feature in the first and second video 
image frames includes determining affine motion transformations between the 
arbitrary image feature in the first and second video image frames. 

8. The method of claim 1 in which the first video image frame precedes the 
second video image frame. 

9. The method of claim 1 further comprising encoding the transform error in 
a first compressed format, decoding the transform error from the first compressed 
format to form a quantized transform error, and correcting the estimated arbitrary 
image feature in the second video image according to the quantized transform error. 

10. The method of 9 in which the transform error encoded in the first 
compressed format is a lossy representation of the transform error. 

1 1 . The method of claim 9 further comprising encoding in a second 
compressed format the transform error encoded in the first compressed format. 

12. The method of claim 1 1 in which the first and second compressed 
formats are, respectively, lossy and lossless compression formats. 

13. The method of claim 1 in which the first and second video image frames 
further include plural other arbitrary image features with arbitrary configurations, at 
least one of the other arbitrary image features having different attributes in the first 
and second video image frames, the method further comprising: 

determining dense motion transformations between the other arbitrary image 
features in the first and second video image frames to determine estimated other 
arbitrary image features in the second video image frame; and 

identifying differences between the estimated other arbitrary image features in 
the second video image frame and the other arbitrary image features in the second 
video image frame to determine transform errors for the other arbitrary image 
features. 

14. A data structure stored on a computer-readable medium and representing 
in a compressed format information within a video image frame sequence having 
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plural video image frames that include plural image features that have different 
attributes in the selected ones of the video image frames, the data structure 
comprising: 

selected image feature data corresponding to selected characteristics of the 
5 plural image features; 

affine transform coefficient data corresponding to coefficients of affine 
transformations that represent changes of the plural image features between the 
selected ones of the video image frames; and 

transform error data corresponding to errors in the changes of the plural 
10 image features represented by the affine transformations. 

15. The data structure of claim 14 in which at least one the affine transform 
coefficient data and the transform error data is encoded in a compressed format. 

16. The data structure of claim 14 in which the selected characteristics of the 
plural image features include binary mask representations of the plural image 

1 5 features. 

17. The data structure of claim 14 in which the selected characteristics of the 
plural image features include plural selected pixels from each of the plural image 
features. 

18. The data structure of claim 14 in which the selected characteristics of the 
20 plural image features include sprites that each represent the different attributes of 

one of the image features in the plural video image frames. 

19. A method of decoding compressed information relating to an arbitrary 
image feature with an arbitrary configuration within first and second video image 
frames of a video image frame sequence, the arbitrary image feature having different 

25 attributes in the first and second video image frames, the method comprising: 

applying a dense motion transformation to a representation of the arbitrary 
image feature in the first video image frame to form an estimated arbitrary image 
feature in the second video image frame; and 

applying a transform difference to the estimated arbitrary image feature in the 
30 second video image frame to obtain the arbitrary image feature in the second video 
image frame. 
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20. The method of claim 19 in which the video image frame sequence 
further includes a third video image frame that includes the arbitrary image feature 
and the method further comprises: 

applying a dense motion transformation to the arbitrary image feature in the 
5 second video image frame to form an estimated arbitrary image feature in the third 
video image frame; and 

applying a transform difference to the estimated arbitrary image feature in the 
third video image frame to obtain the arbitrary image feature in the third video 
image frame. 

^0 21. The method of claim 19 in which image features in the video image 

frame sequence are formed from plural pixels and in which the dense motion 
transformation represents correlations between corresponding pixels of the arbitrary 
image feature in the first and second video image frames. 

22. The method of claim 19 in which the dense motion transformation 

15 includes an affme motion transformation between the arbitrary image feature in the 
first and second video image frames. 

23. The method of claim 19 in which the first video image frame precedes 
the second video image frame. 

24. The method of claim 19 in which the first and second video image 
20 frames further include plural other arbitrary image features with arbitrary 

configurations, at least one of the other arbitrary image features having different 
attributes in the first and second video image frames, the method further comprising: 

applying dense motion transformations to representations of the other 
arbitrary image features in the first video image frame to form estimated other 
25 arbitrary image features in the second video image frame; and 

applying transform differences to the estimated other arbitrary image features 
in the second video image frame to obtain the other arbitrary image features in the 
second video image frame. 

25. A computer- readable medium storing computer-executable programming 
30 for encoding in a compressed format information within a video image frame 

sequence having first and second video image frames that include an arbitrary image 
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feature with an arbitrary configuration, the arbitrary image feature having different 
attributes in the first and second video image frames, the medium comprising: 

programming for determining a dense motion transformation between the 
arbitrary image feature in the first and second video image frames to determine an 
5 estimated arbitrary image feature in the second video image frame; and 

programming for identifying a difference between the estimated arbitrary 
image feature in the second video image frame and the arbitrary image feature in the 
second video image frame to determine a transform error for the arbitrary image 
feature. 

10 26. A computer-readable medium storing computer-executable programming 

for decoding compressed information relating to an arbitrary image feature with an 
arbitrary configuration within first and second video image frames of a video image 
frame sequence, the arbitrary image feature having different attributes in the first and 
second video image frames, the medium comprising: 
15 programming for applying a dense motion transformation to a representation 

of the arbitrary image feature in the first video image frame to form an estimated 
arbitrary image feature in the second video image frame; and 

programming for applying a transform difference to the estimated arbitrary 
image feature in the second video image frame to obtain the arbitrary image feature 
20 in the second video image frame. 

27. A method of encoding in a compressed format information within a 
video image frame sequence having first and second video image frames that include 
an image component, the image component having different attributes in the first 
and second video image frames, the method comprising: 
25 determining a motion transformation between the image component in the 

first and second video image frames to determine an estimated image component in 
the second video image frame; 

identifying a difference between the estimated image component in the 
second video image frame and the image component in the second video image 
30 frame to determine a transform error for the image component; 

applying the transform error to the estimated image component in the second 
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video image frame to form a corrected image component in the second video image 
frame; and 

encoding the corrected image component in a first compressed format. 

28. A block matching motion estimation method for estimating motion of 
5 corresponding pixels between first and second video image frames, comprising: 

defining a reference pixel block of multiple pixels relative to a first reference 
pixel in the first video image frame and a sample pixel block of multiple sample 
pixels in the second video image frame, the reference pixel block being a non- 
quadrilateral polygonal array of pixels; 
10 determining and storing for the pixels in the sample pixel block correlations 

to the pixels in the reference pixel block; and 

identifying from the correlations a first sample pixel corresponding to the 
first reference pixeL 

29. The method of claim 1 in which the first reference pixel and the first 

1 5 sample pixel are included in arbitrary first and second image features in the first and 
second video image frames, respectively, the arbitrary first image feature having an 
interior that is bounded by an image feature perimeter and the reference pixel block 
including a pixel block perimeter that conforms to the image feature perimeter. 

30. The method of claim 29 further comprising: 

20 defining relative to the first reference pixel a preliminary quadrilateral 

reference pixel block of plural pixels; 

identifying the pixels of the preliminary quadrilateral pixel block as lo 
whether they are in the interior of the arbitrary first image feature, at least one of 
the pixels in the preliminary quadrilateral pixel block not being in the interior of the 
25 arbitrary first image feature; and 

establishing as the reference pixel block the pixels of the preliminary 
quadrilateral pixel block in the interior of the arbitrary first image feature. 

3 1 . The method of claim 29 in which the arbitrary first image feature 
includes plural image feature pixels, the method further comprising: 

30 defining a reference pixel block of multiple pixels relative to each image 

feature pixel in the first video image frame and defining a corresponding sample 
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pixel block of multiple sample pixels in the second video image frame, at least one 
of the reference pixel blocks being a non-quadrilateral polygonal array of pixels; 

determining and storing for the pixels in the sample pixel block correlations 
to the image feature pixel relative to which the corresponding reference pixel block 
5 is defined; and 

identifying from the correlations selected sample pixels in correlation with 
the image feature pixels relative to which the corresponding reference pixel blocks 
are defined. 

32. The method of claim 28 in which determining correlations includes 
10 determining summed absolute errors between pixels of the sample pixel block and 

the reference pixel block. 

33. The method of claim 32 in which the first sample pixel corresponding to 
the first reference pixel is the sample pixel for which the summed absolute error 
between the sample pixel block and the reference pixel block is minimal. 

15 34. The method of claim 32 in which each of the reference and sample 

pixels is represented by three color component values and the summed absolute 
errors E are determined as: 

m-l n-l 



20 



E = S E ( I r,j-r,; | + | g,-g,.' | + | h^h^ \ ), 

i=0 j=0 



in which r^, g^, and by correspond to the color component values representing the 
reference pixels and rjj\ g^', and b,j' correspond to the color component values 
representing the sample pixels. 

35. The method of claim 34 in which the three color component values 
25 correspond to red, green, and blue color components. 

36. The method of claim 28 in which the first video image frame precedes 
the second video image frame. 

37. A block matching motion estimation method for estimating motion of 
pixels between first and second video image frames that include an arbitrary first 

30 image feature of plural image feature pixels, comprising: 

identifying the arbitrary first image feature in the first video image frame; 
defining a reference pixel block of multiple pixels relative to each image 
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feature pixel in the first video image frame and a corresponding sample pixel block 
of multiple sample pixels in the second video image frame; 

identifying for each reference pixel block in the first video image frame the 
pixels of the arbitrary first image feature; and 

identifying from the sample pixels in the sample pixel block first sample 
pixels corresponding to the image feature pixels. 

38. The method of claim 37 in which the arbitrary first image feature has an 
interior that is bounded by an image feature perimeter, the method further 
comprising: 

defining relative to each image feature pixel a preliminary quadrilateral 
reference pixel block of plural pixels; 

identifying the pixels of the preliminary quadrilateral reference pixel block as 
to whether they are in the interior of the arbitrary first image feature; and 

establishing as the reference pixel block the pixels of the preliminary 
-quadrilateral reference pixel block in the interior of the arbitrary first image feature. 

39. The method of claim 38 in which at least one of the pixels in the 
preliminary quadrilateral pixel block is not in the interior of the arbitrary first image 
feature and the reference pixel block is defined to include a pixel block perimeter 
that conforms to the image feature perimeter. 

40. The method of claim 37 in which the reference pixel block is a non- 
quadrilateral polygonal array of pixels. 

41. A computer-readable medium storing computer-executable programming 
for estimating motion of corresponding pixels between first and second video image 
frames, the medium comprising: 

programming for defining a reference pixel block of multiple pixels relative 
to a first reference pixel in the first video image frame and a sample pixel block of 
multiple sample pixels in the second video image frame, the reference pixel block 
being a non-quadrilateral polygonal array of pixels; 

programming for determining and storing for the pixels in the sample pixel 
block correlations to the reference pixel block and the first reference pixel; and 

programming for identifying from the correlations a first sample pixel 
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corresponding to the first reference pixel, 

42. The medium of claim 41 in which the first reference pixel and the first 
sample pixel are included in arbitrary first and second image features in the first and 
second video image frames, respectively, the arbitrary first image feature having an 

5 interior that is bounded by an image feature perimeter and the reference pixel block 
including a pixel block perimeter that conforms to the image feature perimeter, the 
medium further comprising: 

programming for defining relative to the first reference pixel a preliminary 
quadrilateral reference pixel block of plural pixels; 
10 programming for identifying the pixels of the preliminary quadrilateral pixel 

block as to whether they are in the interior of the arbitrary first image feature, at 
least one of the pixels in the preliminary quadrilateral pixel block not being in the 
interior of the arbitrary first image feature; and 

programming for establishing as the reference pixel block the pixels of the 
15 preliminary quadrilateral pixel block in the interior of the arbitrary first image 
feature. 

43. The medium of claim 41 in which the first reference pixel and the first 
sample pixel are included in arbitrary first and second image features in the first and 
second video image frames, respectively, the arbitrary first image feature having 

20 plural image feature pixels in an interior that is bounded by an image feature 

perimeter, the reference pixel block including a pixel block perimeter that conforms 
to the image feature perimeter, the medium further comprising: 

programming for defining a reference pixel block of multiple pixels relative 
to each image feature pixel in the first video image frame and defining a 
25 corresponding sample pixel block of multiple sample pixels in the second video 
image frame, at least one of the reference pixel blocks being a non-quadrilateral 
polygonal array of pixels; 

programming for determining and storing for the pixels in the sample pixel 
block a correlation to the image feature pixel relative to which the corresponding 
30 reference pixel block is defined; and 

programming for identifying from the correlations a selected sample pixel in 
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correlation with the image feature pixel relative to which the corresponding reference 
pixel block is defined. 44. A block matching motion estimation 

method for estimating motion of corresponding pixels between first and second 
video image frames, comprising: 
5 defining first and second reference pixel blocks of multiple pixels relative to 

respective first and second reference pixels in the first video image frame and a 
sample pixel block of multiple sample pixels in the second video image frame, the 
first reference, second reference, and sample pixel blocks including plural respective 
first reference, second reference, and sample subsets of multiple pixels; 

10 determining and storing first correlations between the sample subsets and the 

first reference subsets; 

determining and storing second correlations between the sample subsets and 
the second reference subsets, wherein at least one of the second correlations matches 
one of the first correlations and determining the at least one of the second 

15 correlations includes retrieving the matching one of the first correlations; and 

identifying from the first and second correlations first and second sample 
pixels corresponding to the first and second reference pixels, respectively. 

45. The method of claim 44 in which the pixels of the first and second 
video image frames are arranged as regular arrays of pixels and the first and second 

20 reference subsets are commonly aligned segments of the regular arrays. 

46. The method of claim 44 in which the pixels of the first and second 
video image frames are arranged as regular arrays of rows and columns of pixels 
and the first and second reference subsets are portions of columns of the regular 
arrays. 

25 47. The method of claim 44 in which the first and second correlations 

include multiple correlation components and a selected one of the second 
correlations matches a selected first correlation and includes fewer than all the 
correlation components of the selected first correlation, the method further 
comprising: 

30 retrieving the selected first correlation; and 

conforming the selected first correlation to the selected second correlation. 
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48. The method of claim 47 in which the selected second correlation 
includes a new correlation component not included in the selected first correlation 
and conforming the selected first correlation includes incorporating into it the new 
correlation component. 

49. The method of claim 47 in which the selected second correlation 
includes a new correlation component not included in the selected first correlation 
and the selected first correlation includes a prior correlation component not included 
in the selected second correlation, wherein conforming the selected first correlation 
includes omitting the prior correlation component from and incorporating the new 
correlation component in the selected first correlation. 

50. The method of claim 44 in which at least one of the first and second 
reference pixel blocks is a non-quadrilateral polygonal array of pixels. 

5 1 . The method of claim 44 in which the first reference pixel and the first 
sample pixel are included an arbitrary first image feature in the first and second 
video image frames, respectively, the arbitrary first image feature having an interior 
that is bounded by an image feature perimeter and the first reference pixel block 
including a pixel block perimeter that conforms to the image feature perimeter. 

52. The method of claim 44 in which determining correlations includes 
determining mean absolute errors between the sample and reference subsets of 
multiple pixels. 

53. In a block matching motion estimation method for estimating motion of 
corresponding pixels between first and second video image frames, a method of 
determining correlations between plural reference pixel blocks of multiple pixels and 
a sample pixel block of multiple sample pixels in the second video image frame, 
each reference pixel block being relative to a reference pixel in the first video image 
frame, comprising: 

defining within the reference and sample pixel blocks respective reference 
and sample subsets of multiple pixels; 

determining and storing correlations between the sample and reference 
subsets; and 

identifying from the correlations first sample pixels corresponding to the 
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reference pixels. 

54. The method of claim 53 in which first and second correlations are 
determined between the sample subsets and the reference subsets of preceding 
reference pixel blocks and subsequent reference pixel blocks, respectively. 

55. The method of claim 54 in which at least one of the second correlations 
matches one of the first correlations and determining the at least one of the second 
correlations includes retrieving the matching one of the first correlations. 

56. The method of claim 54 in which the first and second correlations 
include multiple correlation components and a selected second correlation includes 
fewer than ail the correlation components of a selected first correlation that matches 
the selected second correlation, the method further comprising: 

retrieving the selected first correlation; and 

conforming the selected first correlation to the selected second correlation. 

57. The method of claim 56 in which the selected second correlation 
includes a new correlation component not included in the selected first correlation 
and conforming the selected first correlation includes incorporating into it the new 
correlation component. 

. 58. The method of claim 56 in which the selected second correlation 
includes a new correlation component not included in the selected first correlation 
and the selected first correlation includes a prior correlation component not included 
in the selected second correlation, wherein conforming the selected first correlation 
includes omitting the prior correlation component from and incorporating the new 
correlation component in the selected first correlation. 

59. A computer-readable medium storing computer-executable programming 
for estimating motion of corresponding pixels between first and second video image 
frames, the medium comprising: 

programming for defining within the reference and sample pixel blocks 
respective reference and sample subsets of multiple pixels; 

programming for determining and storing correlations between the sample 
and reference subsets; and 

programming for identifying from the correlations first sample pixels 
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corresponding to the reference pixels. 

60. The medium of claim 59 further comprising programming for 
determining first and second correlations between the sample subsets and the 
reference subsets of preceding reference pixel blocks and subsequent reference pixel 

5 blocks, respectively. 

61. The medium of claim 60 further comprising programming for 
determining that a selected second correlation matches a selected first correlation and 
programming for determining the selected second correlation with retrieval of the 
matching selected first correlation. 

10 62. The medium of claim 60 in which the first and second correlations have 

multiple correlation components and a selected second correlation has fewer than all 
the correlation components of the matching one of the first correlations, the medium 
further comprising: 

programming for retrieving the matching one of the first correlations; and 

15 programming for conforming the matching one of the first correlations to the 

at least one of the second correlations. 

63. A data structure stored on a computer- readable medium and representing 
an estimation of motion of corresponding pixels between first and second video 
image frames, the first video image frame including first and second reference pixels 

20 relative to which respective first and second reference pixel blocks are defined, and 

the second video image frame including a sample pixel block of multiple sample 

pixels, comprising: 

reference and sample pixel block subset data representing multiple pixel 

subsets of the reference and sample pixel blocks; and 
25 subset correlation data representing correlations between the multiple pixel 

subsets of the reference and sample pixel blocks. 

64. A precompression video transformation method of transforming an 
arbitrary image feature with a feature boundary of arbitrary configuration to an 
image component of predetermined configuration for encoding in a compressed 

30 format, the arbitrary image feature and feature boundary including plural feature 
pixels with associated pixel values and the image component including the plural 
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feature pixels and plural non-feature pixels, the method comprising: 

defining the image component of predetermined configuration about the 
image feature and identifying the non-feature pixels; 

identifying plural non-feature pixel sets each of plural adjacent non-feature 
> pixels and including at least one non-feature pixel adjacent a feature pixel of the 
feature boundary; and 

assigning to the non-feature pixels in each non-feature pixel set a pixel value 
that includes the pixel value of the feature pixel of the feature boundary adjacent the 
at least one of the non-feature pixels in the non-feature pixel set. 

65. The method of claim 64 in which the image component includes non- 
feature pixels other than the ones in the plural non-feature pixel sets, the method 
further comprising: 

identifying as unassigned pixels the non-feature pixels in the image 
component not included in the non-feature pixel sets; 
15 identifying pairs of non-feature pixel sets adjacent the unassigned pixels; and 

assigning to unassigned pixels pixel values that include the pixel values of 
the adjacent pairs of non-feature pixels sets. 

66. The method of claim 64 in which the feature and non-feature pixels of 
the image component are arranged as an array of rows and colurrms of pixels and 

20 the non-feature pixel sets include a row and a column of non-feature pixels. 

67. The method of claim 64 in which selected pairs of non-feature pixel sets 
include common non-feature pixels and the common non-feature pixels include the 
pixel values assigned to the non-feature pixels of both non-feature pixel sets. 

68. The method of claim 67 in which the common non-feature pixels are 
25 assigned pixel values that include averages of the pixel values assigned to the non- 
feature pixels of both non-feature pixel sets. 

69. The method of claim 64 in which non-feature pixels in at least one of 
the non-feature pixel sets are assigned the pixel value of the adjacent feature pixel of 
the feature boundary. 

30 70. The method of claim 64 in which non-feature pixels in one non-feature 

pixel set are assigned the pixel value of the feature pixel of the feature boundary 
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adjacent the at least one of the non-feature pixels in the non-feature pixel set. 

71. A precompression video transformation method of transforming an 
arbitrary image feature with a feature boundary of arbitrary configuration to an 
image component of predetermined configuration for encoding in a compressed 
format, the arbitrary image feature and feature boundary including plural feature 
pixels with associated pixel values and the image component including the plural 
feature pixels and plural non-feature pixels, the method comprising: 

assigning to each of the non-feature pixels of the image component a pixel 
value that includes the pixel value of a feature pixel of the feature boundary. 

72. The method of claim 71 in which selected non-feature pixels in the 
image component are assigned pixel values of plural feature pixels of the feature 
boundary. 

73. The method of claim 72 in which the selected non- feature pixels are 
assigned pixel values that include an average of the pixel values of plural feature 
pixels of the feature boundary, 

74. The method of claim 71 in which non-feature pixels in the image 
component are assigned the pixel values of feature pixels of the feature boundary. 

75. The method of claim 71 in which the feature and non- feature pixels of 
the image component are arranged as an array of rows and columns of pixels, the 
method further comprising: 

identifying selected non-feature pixels of the image component that are in 
rows or columns with feature pixels of the feature boundary; and 

assigning to the selected non-feature pixels pixel values that include the pixel 
values of the feature pixels of the feature boundary. 

76. A computer-readable medium storing computer-executable programming 
for transforming an arbitrary image feature with a feature boundary of arbitrary 
configuration to an image component of predetermined configuration for encoding in 
a compressed format, the arbitrary image feature and feature boundary including 
plural feature pixels with associated pixel values and the image component including 
the plural feature pixels and plural non-feature pixels, the medium comprising: 

programming for assigning to each of the non-feature pixels of the image 
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component a pixel value that includes the pixel value of a feature pixel of the 
feature boundary. 

77. The medium of claim 15 further comprising programming for assigning 
to selected non- feature pixels in the image component pixel values of plural feature 

5 pixels of the feature boundary. 

78. A data structure stored on a computer-readable medium and representing 
a precompression extrapolation of an arbitrary image feature with a feature boundary 
of arbitrary configuration to an image component of predetermined configuration, 
the arbitrary image feature and feature boundary including plural feature pixels with 

10 associated pixel values and the image component including the plural feature pixels 
and plural non-feature pixels, the data structure comprising: 

image feature data representing the arbitrary image feature and the feature 
boundary of arbitrary configuration and including pixel values of pixels in the 
feature boundary; 

1 5 image component data representing the image component of predetermined 

configuration about the image feature; and 

non-feature pixel data representing the non-feature pixels with pixel values 
that include value that includes the pixel value of the pixel values of the pixels in 
the feature boundary. 

20 79. The data structure of claim 78 in which the non-feature pixel data 

includes pixel values that represent non-feature pixels with pixel values that are the 
same as pixel values of pixels in the feature boundary. 

80. The data structure of claim 78 in which the non-feature pixel data 
includes pixel values that represent non-feature pixels with pixel values that are 

25 averages of pixel values of pixels in the feature boundary. 
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